How to use the Amazon AWS SDK for Textract with PHP 7.0 Asynchronously

A few days ago, I got an interesting question about my post which describes using the Amazon AWS SDK for Texttract. The question was “How can I d this with a PDF stored in S3? I know you need to use analyzeDocumentAsynch but unsure how to then get the results of the Asynch operation“.

It turns out to be pretty easy, once you’ve got the synchronous example running. The synchronous Textract example is described in that previous blog post.

Here are the code changes you need to make. Keep all the source code as before, but starting with the call to analyzeDocument, replace that and the following lines with this code:

$promise = $client->analyzeDocumentAsync($options);
$promise->then(
    // $onFulfilled
    function ($value) {
		echo 'The promise was fulfilled.';
		processResult($value);
    },
    // $onRejected
    function ($reason) {
        echo 'The promise was rejected.';
    }
);

// If debugging:
// echo print_r($result, true);
function processResult($result) {
	$blocks = $result['Blocks'];
	// Loop through all the blocks:
	foreach ($blocks as $key => $value) {
		if (isset($value['BlockType']) && $value['BlockType']) {
			$blockType = $value['BlockType'];
			if (isset($value['Text']) && $value['Text']) {
				$text = $value['Text'];
				if ($blockType == 'WORD') {
					echo "Word: ". print_r($text, true) . "\n";
				} else if ($blockType == 'LINE') {
					echo "Line: ". print_r($text, true) . "\n";
				}
			}
		}
	}
}

When you run your PHP code from the command line, you’ll notice a small wait while the asynchronous code processes, and then you’ll see the same output as before.

Here’s a link to the Guzzle Promises project to give you an idea of how to use Promises in PHP.

And here’s the full source example use of analyzeDocumentAsync.

If you found this interesting, click the subscribe button below! I write a new post about once a week.

8 thoughts on “How to use the Amazon AWS SDK for Textract with PHP 7.0 Asynchronously

  1. Basant Kumar Sharma

    Error executing “AnalyzeDocument” on “https://textract.us-east-1.amazonaws.com”; AWS HTTP error: Client error: `POST https://textract.us-east-1.amazonaws.com` resulted in a `400 Bad Request` response:
    {“__type”:”UnsupportedDocumentException”,”Message”:”Request has unsupported document format”}
    UnsupportedDocumentException (client): Request has unsupported document format – {“__type”:”UnsupportedDocumentException”,”Message”:”Request has unsupported document format”}

    Reply
    1. fullstackdev Post author

      Sounds like there’s something wrong with the document you tried to analyze. I’d need more information in order to figure out what went wrong for you.

      Reply
  2. Mei

    Hi, i tried your code, but it always return blank page. Like the process skip the promise->then and not echo none of the example. Can you help why?

    Reply
    1. fullstackdev Post author

      Mei, I just tested this demo again, and it works fine for me. I can think of a few things that might have gone wrong for you:
      * Make sure that your project is set up the way that I showed it in my previous blog post:
      https://www.fullstackoasis.com/articles/2019/09/16/how-to-use-the-amazon-aws-sdk-for-textract-with-php-7-2/
      * Make sure you’ve got an IAM user set up with the correct credentials.
      * Make sure you’ve set up the “client” as shown in that blog post, and that your secret key is either set up in the php script or in a credentials file.

      I hope this helps! If not, post another comment.

      Reply
      1. Mei

        Hi, i try your previous article which using image file and function analyzeDocument and works well. The problem when i using the analyzeDocumentAsync to analyze pdf file, i should get the result with promise->then , but it always return blank page.
        i tried using promise->wait, but it always return unsupported document, but my file type is pdf and works well when i am using the textract demo.
        I just wondering why it works well on image, but not in other file like pdf.

        Reply
        1. fullstackdev Post author

          Hello Mei,

          The issue is that the PDF document type is not supported for this API.

          I downloaded a simple PDF file from the web, and then I edited my code to point to that PDF file.

          I saw the message “The promise was rejected.” If you search my code for those words, you will find the line of code in question. I added a little bit of code to show the “reason” for the rejection. Like this:

          echo ‘reason: ‘ . $reason;

          After doing this, I ran the code again, and I saw a big stack trace, including these words: “UnsupportedDocumentException (client): Request has unsupported document format…”

          This gave me an idea that PDFs are not supported. So what is the problem?

          You have to look through the documentation to understand what is going wrong.

          The “AnalyzeDocument” API will take a stream of bytes as input. If you read the documentation for “AnalyzeDocument“, it says: “The input document as base64-encoded bytes or an Amazon S3 object. If you use the AWS CLI to call Amazon Textract operations, you can’t pass image bytes. **The document must be an image in JPEG or PNG format.**”

          If you want to analyze a PDF asynchronously, the file has to be hosted in an S3 bucket, and you have to use StartDocumentAnalysis to initiate the process and then use GetDocumentAnalysis.

          The takeaway is that you have to use a different API for PDF files. Also, PDF files must be hosted in an S3 bucket. So far as I can tell, these APIs don’t work for PDFs on your local machine.

          Please let me know if you are able to get it working? I will try to find time for another post that shows how this works.

          Reply

Leave a Reply

Your email address will not be published. Required fields are marked *