Category Archives: SDK

How to use the Amazon AWS SDK for Textract with PHP 7.0 Asynchronously

A few days ago, I got an interesting question about my post which describes using the Amazon AWS SDK for Texttract. The question was “How can I d this with a PDF stored in S3? I know you need to use analyzeDocumentAsynch but unsure how to then get the results of the Asynch operation“.

It turns out to be pretty easy, once you’ve got the synchronous example running. The synchronous Textract example is described in that previous blog post.

Here are the code changes you need to make. Keep all the source code as before, but starting with the call to analyzeDocument, replace that and the following lines with this code:

$promise = $client->analyzeDocumentAsync($options);
$promise->then(
    // $onFulfilled
    function ($value) {
		echo 'The promise was fulfilled.';
		processResult($value);
    },
    // $onRejected
    function ($reason) {
        echo 'The promise was rejected.';
    }
);

// If debugging:
// echo print_r($result, true);
function processResult($result) {
	$blocks = $result['Blocks'];
	// Loop through all the blocks:
	foreach ($blocks as $key => $value) {
		if (isset($value['BlockType']) && $value['BlockType']) {
			$blockType = $value['BlockType'];
			if (isset($value['Text']) && $value['Text']) {
				$text = $value['Text'];
				if ($blockType == 'WORD') {
					echo "Word: ". print_r($text, true) . "\n";
				} else if ($blockType == 'LINE') {
					echo "Line: ". print_r($text, true) . "\n";
				}
			}
		}
	}
}

When you run your PHP code from the command line, you’ll notice a small wait while the asynchronous code processes, and then you’ll see the same output as before.

Here’s a link to the Guzzle Promises project to give you an idea of how to use Promises in PHP.

And here’s the full source example use of analyzeDocumentAsync.

How does S3 generate the URL with putObject method?

Recently, I noticed a question on a forum about the AWS SDK S3Client class.

The person was using the putObject method of S3Client to upload a file to an Amazon S3 bucket.

After that, he needed to figure out the URL which could be used to access that file. He had figured out that uploading a file called cat.gif could be accessed with the URL “https://s3.eu-west-3.amazonaws.com/aws.mybucket.es/mysite/httpdocs/cat.gif”.

The problem was that when he uploaded a file whose name included special characters, such as an accented o – “ó” – he couldn’t figure out a consistent way to construct the URL. A character with an accent got URL encoded, but the parenthesis character in a file name did not!

He was trying to figure out the implementation details for the putObject method, and couldn’t find any documentation about it.

The answer to his question was that he was asking the wrong question! There’s a software principle that you should “write code to the interface, not to the implementation“.

As consumers of the S3Client API, we should not be trying to figure out the URL to an uploaded file. Rather, we should be asking the interface for the URL. If AWS revealed the details of their URL construction scheme, it would be very painful if they ever decided to change it, both for them and for users of S3. Further, programmers everywhere would be forced to implement the algorithm that AWS declared for URL construction in all the different languages that are supported by the AWS SDK. That’s a lot of duplicated effort.

Fortunately, AWS gives us an interface that can be used to obtain the URL after a file is uploaded. The result of S3Client->putObject contains an ObjectURL property. We can use that to get the URL, which we can record however we want for later use. Here’s an example:

...
$result = $s3->putObject(...);
$url = $result['ObjectURL'];
...

The full source code for this example of using the S3Client putObject method is at github.

So you see that there’s no need to figure out how AWS implements the URL for our file. AWS gives us the URL immediately when our file is uploaded.

Got comments? Send me an email at fullstackdev@fullstackoasis.com. If you found this interesting, you can hit the subscribe button above. I post new content about once a week.

How to use Amazon AWS Translate with PHP 7.0

Amazon AWS Translate is a pretty cool translation service. You can get started free of charge. Let’s give it a try. This demo assumes you’ve got an AWS account (if not, first go get that). I’m using PHP 7.0 on an Ubuntu 16.04 box.

First, create a new IAM (Identity and Access Management) group. Let’s call it TranslateGroup. Give it TranslateReadOnly permissions. Don’t know how to do this? Sign into your AWS console, and search for “IAM”. That will take you to the right place for dealing with IAM.

Add a new user to this group. Let’s call this user TranslateUser. Give it programmatic access only.

When you see your Access key ID and secret, copy them into your AWS credentials file (in Linux, this is located under ~/.aws/credentials). Set the header for the profile to be [TranslateUser].

Now that you’ve created a user, make sure you’ve installed the AWS PHP SDK. I did this in my demo directory, just by downloading the SDK and unzipping it. The contents of my directory are pretty simple:

~/TranslateDemo$ ls -lairt
total 164
18226436 drwxr-xr-x   3 fullstackdev fullstackdev     4096 Jul 11 15:06 Psr
18226304 drwxr-xr-x   2 fullstackdev fullstackdev     4096 Jul 11 15:06 JmesPath
18226324 drwxr-xr-x   7 fullstackdev fullstackdev     4096 Jul 11 15:06 GuzzleHttp
18226301 -rw-r--r--   1 fullstackdev fullstackdev   129259 Jul 11 15:06 aws-autoloader.php
18226446 drwxr-xr-x 197 fullstackdev fullstackdev    12288 Jul 11 15:06 Aws
   6961244 -rw-rw-r-- 1 fullstackdev fullstackdev      958 Sep 16 20:32 test_translate.php
...

It’s quick and easy to code up the rest. Here’s some demo code (test_translate.php):

<?php
require './aws-autoloader.php';

use Aws\Translate\TranslateClient;
use Aws\Exception\AwsException;

$client = new Aws\Translate\TranslateClient([
    'profile' => 'TranslateUser',
    'region' => 'us-west-2',
    'version' => 'latest'
]);

// Translate from English (en) to Spanish (es).
$currentLanguage = 'en';
$targetLanguage= 'es';
$textToTranslate = "Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.";

echo "Calling translateText function on '".$textToTranslate."'\n";

try {
    $result = $client->translateText([
        'SourceLanguageCode' => $currentLanguage,
        'TargetLanguageCode' => $targetLanguage,
        'Text' => $textToTranslate,
    ]);
    echo $result['TranslatedText']."\n";
} catch(AwsException $e) {
    // output error message if fails
    echo "Failed: ".$e->getMessage()."\n";
}

Run this from the command line: php test_translate.php. The output is:

Calling translateText function on 'Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.'
Llámame Ishmael. Hace algunos años, no importa cuánto tiempo precisamente— teniendo poco o ningún dinero en mi bolso, y nada particular que me interesara en la costa, pensé que navegaría un poco y vería la parte acuosa del mundo.

Pretty easy, right? If you found this interesting, hit the subscribe button above. Got comments? Send me an email at fullstackdev@fullstackoasis.com. I post new content just about every week.

How to use the Amazon AWS SDK for Textract with PHP 7.0

The Amazon AWS Textract API lets you do OCR (optical character recognition) on digital files. It’s actually pretty easy to use, although there’s some prep work.

This post has instructions for using the Textract API with their PHP SDK. I’m using PHP version 7.0 on an Ubuntu 16.2 operating system. This demo works as of September 2019.

Step 1: Create the project

Create a folder for your project, for example:

mkdir ~/TextractDemo ; cd ~/TextractDemo

Instructions for getting started with the SDK for PHP are here. First, download the .zip file as described on that page. Then, extract the zip file to the root of your project. That adds a lot of files and folders to the project root. For example, the “Aws” folder is added. This is what you should see when listing the contents of this directory:

~/TextractDemo$ ls -lairt
total 676
  396747 -rw-r--r--   1 fullstackdev fullstackdev  10129 Sep 12 14:11 README.md
  531373 drwxr-xr-x   3 fullstackdev fullstackdev   4096 Sep 12 14:11 Psr
  396739 -rw-r--r--   1 fullstackdev fullstackdev   2881 Sep 12 14:11 NOTICE.md
  399132 -rw-r--r--   1 fullstackdev fullstackdev   9202 Sep 12 14:11 LICENSE.md
  926072 drwxr-xr-x   2 fullstackdev fullstackdev   4096 Sep 12 14:11 JmesPath
  396755 drwxr-xr-x   7 fullstackdev fullstackdev   4096 Sep 12 14:11 GuzzleHttp
  399129 -rw-r--r--   1 fullstackdev fullstackdev 478403 Sep 12 14:11 CHANGELOG.md
  396748 -rw-r--r--   1 fullstackdev fullstackdev 132879 Sep 12 14:11 aws-autoloader.php
  531270 drwxr-xr-x 203 fullstackdev fullstackdev  12288 Sep 12 14:11 Aws
  396729 drwxr-xr-x   6 fullstackdev fullstackdev   4096 Sep 15 09:48 .
13500418 drwxr-xr-x  46 fullstackdev fullstackdev  20480 Sep 15 09:49 ..

Step 2: Create an IAM User

In order to use the Textract API, you need an Amazon AWS account. So if you don’t have that already, go follow the instructions to do that now.

Assuming you’ve got an AWS account, next, you need to create an IAM (Identity and Access Management) user. If you are signed in to your AWS console, just search for “Identity and Access Management”, and it takes you to the right place to create an IAM user. There’s an area called “Create individual IAM users”. Go there, click the “Manage Users” button, click the “Add User” button, choose a name like TextractUser, and give this user programmatic access only. Once you’ve created the name, go to the next step, where you can add the user to a specific group. Create a group which has the AmazonTextractFullAccess policy name. Name it something like TextractFullAccessGroup, and save that. Add the user you just created to this group. The next step lets you add tags to the user, but you can leave that blank.

In the Review (last) step, you are given the user’s access key ID and secret key (which is hidden – you will have to reveal it to copy it). Save these in a secure place! As the documentation says, “This is the last time these credentials will be available to download. However, you can create new credentials at any time.” (So if you lose them somehow, you can always generate a new set.)

The credentials that you just created may be saved in the file ~/.aws/credentials on Linux systems. Here’s a quick rundown about that file.

If this file already exists, you can add to it. Here’s the documentation for adding lines to an AWS credentials file. On that page, it gives you an example credentials file with this content:

[default]
aws_access_key_id=AKIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

[user1]
aws_access_key_id=AKIAI44QH8DHBEXAMPLE
aws_secret_access_key=je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY


Instead of user1, add the line [TextractUser] (or whatever user name you used in the “creating user” step above). Copy and paste your access key id and secret key as shown.

The credentials file is normally created when installing the AWS CLI. So if you do not already have a credentials file, install the CLI first. Then you can add users to the file.

Now we’re ready to use Textract. Let’s try to detect text in a sample “document” – the image file shown below. If you are following along, you can right click and save this image, or you can try it on one of your own image files (i.e. one that contains text!).

Test file for Textract

Call Textract using the SDK

You can have Textract analyze images that are in an S3 bucket. However, for demo purposes, that is overkill! It is simpler and quicker to read in an image file as bytes, and send that to Textract for analysis. That’s what we will do.

The source code only needs to do three things. First, it needs to create a Textract client. Second, it needs to read in the image file as bytes. Third, the client needs to call the Textract API. Here’s the demo code:

<?php
/*
 * To run this project, make sure that the AWS PHP SDK has been unzipped in the current directory.
 * 
 * Caution: this is not production quality code. There are no tests, and there is no error handling.
 */
require './aws-autoloader.php';

use Aws\Credentials\CredentialProvider;
use Aws\Textract\TextractClient;

// If you use CredentialProvider, it will use credentials in your .aws/credentials file.
/*
$provider = CredentialProvider::env();
$client = new TextractClient([
	'profile' => 'TextractUser',
    'region' => 'us-west-2',
	'version' => '2018-06-27',
	'credentials' => $provider
]);
*/
$client = new TextractClient([
    'region' => 'us-west-2',
	'version' => '2018-06-27',
	'credentials' => [
        'key'    => 'AKIAI44QH8DHBEXAMPLE',
        'secret' => 'je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY'
	]
]);

// The file in this project.
$filename = "aws_cli_text_document.jpg";
$file = fopen($filename, "rb");
$contents = fread($file, filesize($filename));
fclose($file);
$options = [
    'Document' => [
		'Bytes' => $contents
    ],
    'FeatureTypes' => ['FORMS'], // REQUIRED
];
$result = $client->analyzeDocument($options);
// If debugging:
// echo print_r($result, true);
$blocks = $result['Blocks'];
// Loop through all the blocks:
foreach ($blocks as $key => $value) {
	if (isset($value['BlockType']) && $value['BlockType']) {
		$blockType = $value['BlockType'];
		if (isset($value['Text']) && $value['Text']) {
			$text = $value['Text'];
			if ($blockType == 'WORD') {
				echo "Word: ". print_r($text, true) . "\n";
			} else if ($blockType == 'LINE') {
				echo "Line: ". print_r($text, true) . "\n";
			}
		}
	}
}
?>

You’ll need to edit this source code to use your own AWS credentials. Once you do that, you should be able to run the code and view the output, as shown here:

php textract_demo.php 
Line: The AWS CLI is updated frequently with support for new services and commands.
Word: The
Word: AWS
Word: CLI
...

That’s it! Feel free to email me with any questions. If you found this interesting, hit the subscribe button above. Got comments? Send me an email at fullstackdev@fullstackoasis.com. I post new content just about every week.

References:

[1] Stackoverflow question about AWS Credentials