Using Adobe PDF Services With Amazon Lambda

This article was originally published in September 2021.

Serverless has been one of the hottest trends for many years now and it’s easy to see why. If I could try to encapsulate a huge topic in one nugget, serverless lets developers focus on code and not infrastructure. Of course, there are a heck of a lot of caveats to that statement, but in general, it’s incredibly freeing as a developer to be able to focus on features and not servers.

One of the most powerful leaders in the serverless space, and most likely the product most credited for kick-starting the serverless movement, is Amazon Lambda. Amazon’s serverless platform is incredibly Powerful and complex, supporting everything from small projects to huge enterprise clients.

I’ve been using various serverless-related products for years now (starting with the excellent Apache OpenWhisk) but haven’t ever really built anything with Amazon. We recently added support making it easier for AWS developers to work with our services. This process is documented in a blog post we wrote with Amazon: Reimagining Developer and Document Experiences with the Adobe PDF Services API I thought I’d try this process myself and see what I could build as a new Lambda developer. I want to stress that “new” part there and make it clear I’m really new at this. I was successful but please keep in mind that there are probably more elegant ways of doing this. Unless my code is perfect, I take all of that back.

What We’re Building

My idea was a rather simple one. One of the features of our PDF Services is the ability to protect a PDF with a password. A popular AWS service, S3, is used as a cloud-based storage system. What if we could automate PDF protections based on files stored in S3? So a user uploads a PDF to an S3 bucket (this is how S3 refers to groups of files and could be considered as a top-level folder for documents) which fires off an event that triggers a Lambda function. The Lambda function will use our PDF Services API to protect the PDF. Finally, the protected PDF is uploaded to a new location.

Cloud File Storage Setup

The first step was to create a bucket for my application. I named it pdfprotecttest1 (because I’m horribly imaginative) and created two folders inside, input and output:

In a real-world application, I’d then build some way for files to get into the input folder. I could use Amplify to build an AWS-hosted website with credentials that let users select PDFs for upload. As my focus here is on the Lambda integration, I didn’t build that out. The S3 console lets you upload files directly to a bucket and there are numerous desktop clients out there that simplify it as well. (I use a piece of software called Cyberduck that has an incredibly cute icon so it must be good.)

Writing the Serverless Function

To create the connection between S3 and Lambda, I followed this tutorial: Using an Amazon S3 trigger to invoke a Lambda function. I won’t repeat the steps as it just plain worked, but I will share the sample Lambda code so you can get an idea of ​​what it looks like:

console.log('Loading function');
        
const aws = require('aws-sdk');const s3 = new aws.S3({ apiVersion: '2006-03-01' });exports.handler = async (event, context) => {
    //console.log('Received event:', JSON.stringify(event, null, 2));    // Get the object from the event and show its content type
    const bucket = event.Records[0].s3.bucket.name;
    const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/+/g, ' '));
    const params = {
        Bucket: bucket,
        Key: key,
    }; 
    try {
        const { ContentType } = await s3.getObject(params).promise();
        console.log('CONTENT TYPE:', ContentType);
        return ContentType;
    } catch (err) {
        console.log(err);
        const message = `Error getting object ${key} from bucket ${bucket}. Make sure they exist and your bucket is in the same region as this function.`;
        console.log(message);
        throw new Error(message);
    }
};

The important bits here are how the event object is used to figure out what triggered the function. You can see where the bucket and key is retrieved letting you know exactly what file was stored. In this case, bucket is the top-level S3 storage area, so for me, it’s always pdfprotecttest1. The key is the path. Remember I created a input folder where PDFs would be added, so if a PDF was uploaded there named catsAreAwesome.pdfthe key-value would be: input/catsAreAwesome.pdf.

The code then uses the s3 Node library to fetch the file and return the content type.

Now comes the interesting part. In theory, all I had to do at this point was edit the Lambda to do what I wanted. Lambda has a very nice online editor for that, but I also needed to use our Node SDK. For that, I had to work locally and use the AWS CLI to push my changes up.

I began by downloading my Lambda function and placing it in a directory. Once I had my file, I then used npm to install our SDK: npm i @adobe/documentservices-pdftools-node-sdk. I modified the code above to add in the SDK:

const PDFToolsSdk = require('@adobe/documentservices-pdftools-node-sdk');

And then my code. The CLI requires a zip file of my function code and the node_modules folder before I could upload it. In order to make this easier for me, I create a shell script I could run to automate everything:

#!/bin/bashrm pdf.zip
cd pdfProcess
zip -r ../pdf.zip *
cd ..
aws lambda update-function-code --function-name pdfProcess --zip-file fileb://pdf.zip

Basically, remove any existing zip, go into my Lambda directory and zip up everything, and then use the CLI to update my code.

Testing

How did I confirm things were working? As I said, I connected my S3 buckets to the function. If you click on the “Monitor” tab for your Lambda function:

You can then click on the “View logs in CloudWatch” button. This opens a new tab where you can see one log item per invocation:

This was incredibly helpful to me as I would check the latest invocation and see what was output. As you can imagine, I made a few mistakes (I’m not perfect) so this was a really useful way for me to see what my job was doing.

Working With PDFs

To recap, at this point, I’ve got a Lambda connected to S3 such that it fires automatically. I’ve confirmed my function runs and can develop locally and then use the CLI to push up. Here’s how my final solution looks. I’ll share it in bits and pieces and then share the entire file once done.

const bucket = event.Records[0].s3.bucket.name;
const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/+/g, ' '));/*
the key represents the folder (input/) and filename (foo.pdf), we need the filename for later
*/
const params = {
    Bucket: bucket,
    Key: key,
};
const origFilename = key.split("https://dzone.com/").pop();let data = await s3.getObject(params).promise();

This block determines what file was added to S3 and then uses getObject to fetch it. I also get the original file name (foo.pdf for example) so that when I save the file later, I can keep the same name.

let inputFile="/tmp/" + nanoid() + '.pdf';fs.writeFileSync(inputFile, data.Body);
console.log(`wrote to ${inputFile}`);

In order for me to use the Adobe PDF Services SDK, I need a local file for it to work with. Lambda lets you write to /tmp. Whenever I write a temporary file, I typically use a library to give me a unique name. In this case, nanoid is an npm package that makes this pretty simple. The value data.Body comes from the getObject call earlier. Basically, I’m downloading the PDF from S3 and making it available to my Lambda.

At this point the code follows our documentation, albeit a little bit different:

let fixedKey = process.env.ADOBE_KEY.replace(/\n/g, 'n');
const creds = {
    clientId:process.env.ADOBE_CLIENT_ID,
    clientSecret:process.env.ADOBE_CLIENT_SECRET,
    privateKey:fixedKey,
    organizationId:process.env.ADOBE_ORGANIZATION_ID,
    accountId:process.env.ADOBE_ACCOUNT_ID
}

const credentials = PDFToolsSdk.Credentials.serviceAccountCredentialsBuilder()
    .withClientId(creds.clientId)
    .withClientSecret(creds.clientSecret)
    .withPrivateKey(creds.privateKey)
    .withOrganizationId(creds.organizationId)
    .withAccountId(creds.accountId)
    .build();

In most of our examples, we use a local credentials file and key for authentication. But our PDF Services SDK also supports passing them in as values. Lambda functions support env variables so I defined the five values ​​I needed there and then used them in my code above. The only trick one was the private key which needs to have newlines added back in before being used.

Next, we have some more code, right from the docs:

const executionContext = PDFToolsSdk.ExecutionContext.create(credentials);
// Build ProtectPDF options by setting a User Password and Encryption
// Algorithm (used for encrypting the PDF file).
const protectPDF = PDFToolsSdk.ProtectPDF,
    options = new protectPDF.options.PasswordProtectOptions.Builder()
    .setUserPassword("golfball")
    .setEncryptionAlgorithm(PDFToolsSdk.ProtectPDF.options.EncryptionAlgorithm.AES_256)
    .build();

// Create a new operation instance.
const protectPDFOperation = protectPDF.Operation.createNew(options);
// Set operation input from a source file.
const input = PDFToolsSdk.FileRef.createFromLocalFile(inputFile);
protectPDFOperation.setInput(input);

Of note, I’ve hardcoded a password of golfball. You could make this dynamic of course, but you would need some way to convey to the end-users what the password is. Also, note I’ve set the input to the file I stored in /tmp.

And now the final bit:

let result = await protectPDFOperation.execute(executionContext);
let outputFile="/tmp/" + nanoid() + '.pdf';
await result.saveAsFile(outputFile);
console.log(`saved to output ${outputFile}`);

const uploadParams = {
    Bucket: bucket,
    Key: `output/${origFilename}`,
    Body: fs.readFileSync(outputFile),
    ContentType:'application/pdf'
};

await s3.putObject(uploadParams).promise();

const response = {
    statusCode: 200,
    body: {result: `Saved protected PDF to ${uploadParams.Key}`},
};
return response;

As before, I’m using nanoid to create a dynamic file name when I store the result of calling our SDK. I then upload this back to S3. Notice how I’m now using the output folder and the original filename. The last thing I do is return a result. No human will see this as the entire process is driven by an S3 modification, but the result could be useful for debugging later.

Full Lambda Source

Here’s the entire serverless function:

const aws = require('aws-sdk');
const PDFToolsSdk = require('@adobe/documentservices-pdftools-node-sdk');
const nanoid = require('nanoid').nanoid;

const s3 = new aws.S3({ apiVersion: '2006-03-01' });
const fs = require('fs');

exports.handler = async (event,context,exit) => {

    const bucket = event.Records[0].s3.bucket.name;
    const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/+/g, ' '));
    /*
    the key represents the folder (input/) and filename (foo.pdf), we need the filename for later
    */
    const params = {
        Bucket: bucket,
        Key: key,
    };
    
    const origFilename = key.split("https://dzone.com/").pop();
    let data = await s3.getObject(params).promise();
    
    let inputFile="/tmp/" + nanoid() + '.pdf';
    fs.writeFileSync(inputFile, data.Body);
    console.log(`wrote to ${inputFile}`);
    
    let fixedKey = process.env.ADOBE_KEY.replace(/\n/g, 'n');
    const creds = {
        clientId:process.env.ADOBE_CLIENT_ID,
        clientSecret:process.env.ADOBE_CLIENT_SECRET,
        privateKey:fixedKey,
        organizationId:process.env.ADOBE_ORGANIZATION_ID,
        accountId:process.env.ADOBE_ACCOUNT_ID
    }
    
    const credentials = PDFToolsSdk.Credentials.serviceAccountCredentialsBuilder()
        .withClientId(creds.clientId)
        .withClientSecret(creds.clientSecret)
        .withPrivateKey(creds.privateKey)
        .withOrganizationId(creds.organizationId)
        .withAccountId(creds.accountId)
        .build();
    
    const executionContext = PDFToolsSdk.ExecutionContext.create(credentials);
    
    // Build ProtectPDF options by setting a User Password and Encryption
    // Algorithm (used for encrypting the PDF file).
    const protectPDF = PDFToolsSdk.ProtectPDF,
        options = new protectPDF.options.PasswordProtectOptions.Builder()
        .setUserPassword("golfball")
        .setEncryptionAlgorithm(PDFToolsSdk.ProtectPDF.options.EncryptionAlgorithm.AES_256)
        .build();
    
    // Create a new operation instance.
    const protectPDFOperation = protectPDF.Operation.createNew(options);
    
    // Set operation input from a source file.
    const input = PDFToolsSdk.FileRef.createFromLocalFile(inputFile);
    protectPDFOperation.setInput(input);
    
    try {
        let result = await protectPDFOperation.execute(executionContext);
        let outputFile="/tmp/" + nanoid() + '.pdf';
        await result.saveAsFile(outputFile);
        
        console.log(`saved to output ${outputFile}`);
        const uploadParams = {
            Bucket: bucket,
            Key: `output/${origFilename}`,
            Body: fs.readFileSync(outputFile),
            ContentType:'application/pdf'
        };
        
        await s3.putObject(uploadParams).promise();
        const response = {
            statusCode: 200,
            body: {result: `Saved protected PDF to ${uploadParams.Key}`},
        };
        return response;
    } catch(e) {
        console.log('error thrown');
        console.log(e);
    }
};

Next Steps

As I described above, I’m incredibly new to Amazon Lambda and there’s a huge set of additional features and services I could use along with PDF Services API to create more powerful applications. I’d love to see your examples or suggestions for ways to enhance this workflow. Be sure to check our docs for more examples of what can be done with our APIs!

.

Leave a Comment