Screenshot system with Node.js and Lambda

7 min readFeb 11, 2024

I needed to build a system to take screenshots to automate the collection of visual data from the Internet. This is the notes of what I did.

Background

I started a new project in my job last week. I can’t write the details of it here but, Briefly, I needed to automate to take a screenshot of some materials and store the image data on the cloud. What I wanted to take screenshots of is a table filled in with our service data like the following image. The data is changed depending on who and when the table is needed by. So it’s quite dynamic.

I had known there were some tools to do that. However, I had never done it before. Then I started the investigation to achieve it as easily as possible. Firstly, I came up with a solution using Figma. But its API is quite limited to use, especially for creating objects for now (2024/Feb/10). So I changed my direction to using HTML to make tables and started coding.

What I did

Firstly, I made HTML and CSS quickly and tried to take a screenshot of it somewhere. I tested two tools, Satori and Puppeteer.

GitHub - vercel/satori: Enlightened library to convert HTML and CSS to SVG

Enlightened library to convert HTML and CSS to SVG - GitHub - vercel/satori: Enlightened library to convert HTML and…

github.com

Puppeteer | Puppeteer

Build status

pptr.dev

Satori is provided by Vercel for creating Open Graph images and Puppeteer is a high-level API to control Chrome/Chromium in the headless browser. Playwrite can be my option instead of Puppeteer but I read an article saying GitHub uses Puppeteer for Open Graph images so that I chose Puppeteer rather than Playwrite. I made a simple function for each and tested both.

GitHub - atsss/image_generator

Contribute to atsss/image_generator development by creating an account on GitHub.

github.com

After testing, I decided to go with Puppeteer because there are some limitations of HTML and CSS in Satori. I can’t use table tags or display: table so I had to make my table with only div and display: flex . When I didn’t have any limitations, the number of my HTML and CSS lines was about 50 lines for each. However, when I did, the number was more than 100 lines for each. So to keep the code simple, I decided to use Puppeteer.

Then I needed to host the HTML and CSS somewhere because Puppeteer just opens web pages in a headless browser and takes screenshots. So I made a page for it on the existing service and the data is provided by the server and tables change dynamically. If I used Satori, I wouldn’t need to host them because Satori generate HTML snippet as JSX. So hopefully, I want to replace Puppeteer with Satori after the situation is improved.

I chose lambda to run the node function and used SAM to make a quick working environment. This is my first time using SAM but it was much easier than I thought.

GitHub - aws/serverless-application-model: The AWS Serverless Application Model (AWS SAM) transform…

The AWS Serverless Application Model (AWS SAM) transform is a AWS CloudFormation macro that transforms SAM templates…

github.com

I followed the article to set up a node v20 environment in lambda.

Build Serverless APIs with Node.js and AWS Lambda | AppSignal Blog

Learn how to create and deploy Lambda functions with Node.js and AWS SAM in this introduction to AWS Lambda.

blog.appsignal.com

Then I tried to run the test script which I shared above. I got the error.

<--- Last few GCs --->
al[14:0x5602ec806830]      956 ms: Mark-Compact (reduce) 115.6 (118.3) -> 115.6 (118.3) MB, 69.38 / 0.00 ms  (+ 2.4 ms in 2 steps since start of marking, biggest step 2.2 ms, walltime since start of marking 75 ms) (average mu = 0.175, current mu = 0.114) al[14:0x5602ec806830]     1008 ms: Mark-Compact (reduce) 115.6 (118.3) -> 115.6 (118.3) MB, 46.23 / 0.00 ms  (+ 0.0 ms in 0 steps since start of marking, biggest step 0.0 ms, walltime since start of marking 47 ms) (average mu = 0.149, current mu = 0.113) al

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0x5602e77c33a3 node::Abort() [/var/lang/bin/node]
 2: 0x5602e76669d3  [/var/lang/bin/node]
 3: 0x5602e7a0a31d v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/var/lang/bin/node]
 4: 0x5602e7a0a6e9 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [/var/lang/bin/node]
 5: 0x5602e7c82f3a  [/var/lang/bin/node]
 6: 0x5602e7c83472 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/var/lang/bin/node]
 7: 0x5602e7c9caab v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [/var/lang/bin/node]
 8: 0x5602e7c9d318 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/var/lang/bin/node]
 9: 0x5602e7c9ecb0 v8::internal::Heap::CollectAllAvailableGarbage(v8::internal::GarbageCollectionReason) [/var/lang/bin/node]
10: 0x5602e7c6ec81 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/var/lang/bin/node]
11: 0x5602e7c48d1e v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal::AllocationOrigin) [/var/lang/bin/node]
12: 0x5602e811d02d v8::internal::Runtime_AllocateInOldGeneration(int, unsigned long*, v8::internal::Isolate*) [/var/lang/bin/node]
13: 0x5602e85cb2b6  [/var/lang/bin/node]
06 Feb 2024 18:51:54,339 [ERROR] (rapid) Invoke failed error=Runtime exited with error: signal: aborted InvokeID=f9e563a6-0b3b-49de-8b64-231db0d4df70
06 Feb 2024 18:51:54,356 [ERROR] (rapid) Invoke DONE failed: Runtime.ExitError

2024-02-06 18:51:55 127.0.0.1 - - [06/Feb/2024 18:51:55] "GET /screenshot HTTP/1.1" 500 -

So I needed to increase the heap memory. Based on the following documents, I needed to set NODE_OPTIONS like below.

Resources:
  ScreenshotFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: lambda_screenshot/
      Handler: app.lambdaHandler
      Runtime: nodejs20.x
      Architectures:
      - x86_64
      Events:
        Screenshot:
          Type: Api
          Properties:
            Path: /screenshot
            Method: get
      Environment:
        Variables:
          NODE_OPTIONS: --max-old-space-size=1536

Node.js 20.x runtime now available in AWS Lambda | Amazon Web Services

This post is written by Pascal Vogel, Solutions Architect, and Andrea Amorosi, Senior Solutions Architect. You can now…

aws.amazon.com

Building Node.js Lambda functions with esbuild

Use the sam build command to build an AWS SAM serverless application that uses TypeScript as a .zip file archive or a…

docs.aws.amazon.com

Then I ran the lambda again and I got another error.

{
  "timestamp":"2024-02-06T20:02:31.899Z",
  "level":"INFO",
  "requestId":"b8181b7e-5dec-460d-b20f-71f79e3fb2b5",
  "message":{
    "errorType":"Error",
    "errorMessage":"Could not find Chrome (ver. 121.0.6167.85). This can occur if either\n 1. you did not perform an installation before running the script (e.g. `npx puppeteer browsers install chrome`) or\n 2. your cache path is incorrectly configured (which is: /root/.cache/puppeteer).\nFor (2), check out our guide on configuring puppeteer at https://pptr.dev/guides/configuration.",
    "stackTrace":[
      "Error: Could not find Chrome (ver. 121.0.6167.85). This can occur if either",
      " 1. you did not perform an installation before running the script (e.g. `npx puppeteer browsers install chrome`) or",
      " 2. your cache path is incorrectly configured (which is: /root/.cache/puppeteer).",
      "For (2), check out our guide on configuring puppeteer at https://pptr.dev/guides/configuration.",
      "    at rV.resolveExecutablePath (/deps/90387c80-da05-4019-80ff-da8e0a010f0b/node_modules/puppeteer-core/src/node/ProductLauncher.ts:434:17)",
      "    at rV.executablePath (/deps/90387c80-da05-4019-80ff-da8e0a010f0b/node_modules/puppeteer-core/src/node/ChromeLauncher.ts:283:19)",
      "    at rV.computeLaunchArguments (/deps/90387c80-da05-4019-80ff-da8e0a010f0b/node_modules/puppeteer-core/src/node/ChromeLauncher.ts:149:31)",
      "    at rV.launch (/deps/90387c80-da05-4019-80ff-da8e0a010f0b/node_modules/puppeteer-core/src/node/ProductLauncher.ts:99:24)",
      "    at r (/private/var/folders/fh/zhsb2cb120l_tc8xv1jth9r40000gn/T/tmp03oa41b3/app.ts:6:23)",
      "    at Runtime.Vfr (/private/var/folders/fh/zhsb2cb120l_tc8xv1jth9r40000gn/T/tmp03oa41b3/app.ts:36:22)"
    ]
  }
}

Long story short, the error says I didn’t install puppeteer correctly so that the headless browser didn’t work properly. I think the right way to solve this problem is to create a custom Dockerfile and download it correctly. But I didn’t want to create my Dockerfile because the main JS file was quit short like less than 30 lines and I didn’t want to maintain the Dockerfile for the small task. So I started to look for another way to sort out. Then I found the package to run node on Lambda.

GitHub - Sparticuz/chromium: Chromium (x86-64) for Serverless Platforms

Chromium (x86-64) for Serverless Platforms. Contribute to Sparticuz/chromium development by creating an account on…

github.com

There were some old similar packages and this is currently main package to run the headless browser in Lambda. So I installed it to my project and changed the settings of launch function like below.

const browser = await puppeteer.launch({
  args: chromium.args,
  defaultViewport: chromium.defaultViewport,
  executablePath: await chromium.executablePath(),
  headless: chromium.headless,
});

Then I ran the lambda and got the error.

{
  "timestamp":"2024-02-11T10:41:11.318Z",
  "level":"INFO",
  "requestId":"2f7f7779-17d3-4cb3-8197-8311316e89e9",
  "message":{
    "errorType":"Error",
    "errorMessage":"The input directory \"/var/bin\" does not exist.",
    "stackTrace":[
      "Error: The input directory \"/var/bin\" does not exist.",
      "    at Function.executablePath (/deps/90387c80-da05-4019-80ff-da8e0a010f0b/node_modules/@sparticuz/chromium/build/index.js:241:19)",
      "    at getEncodedImage (/private/var/folders/fh/zhsb2cb120l_tc8xv1jth9r40000gn/T/tmpnf6fz4i8/app.ts:10:40)",
      "    at Runtime.swA (/private/var/folders/fh/zhsb2cb120l_tc8xv1jth9r40000gn/T/tmpnf6fz4i8/app.ts:33:28)",
      "    at Runtime.handleOnceNonStreaming (file:///var/runtime/index.mjs:1173:29)"
    ]
   }
 }

Actually, I didn’t get the meaning of the error. However, I saw the issues on the repository and found some comments saying that there was something wrong with Node.js v20.x and TypeScript. So I downgraded the Node.js version to v18.x and disabled TypeScript then the error had gone.

Then the lambda completed successly. But it was missing emojis. In my case, emojis are very important to show the tables. So I followed the README like below.

await chromium.font(
  "https://raw.githack.com/googlei18n/noto-emoji/master/fonts/NotoColorEmoji.ttf"
);

GitHub - Sparticuz/chromium: Chromium (x86-64) for Serverless Platforms

Chromium (x86-64) for Serverless Platforms. Contribute to Sparticuz/chromium development by creating an account on…

github.com

However, it didn’t work well. I did a quick investigation and found an issue similar to mine.

[BUG] chromium.font not working · Issue #207 · Sparticuz/chromium

Environment chromium Version: "@sparticuz/chromium": "119.0.2", puppeteer / puppeteer-core Version: "puppeteer-core"…

github.com

There seems to be a problem with downloading fonts through CDN. So I downloaded the Noto Color Emoji ttf file and located it in Lambda like below.

|____lambda_screenshot
| |____package.json
| |____app.mjs
| |____fonts
| | |____NotoColorEmoji.ttf

Then I changed the code to load the font like below.

await chromium.font("/var/task/fonts/NotoColorEmoji.ttf");

Finally, screenshots display emojis.

That’s it!

Screenshot system with Node.js and Lambda

Background

What I did

GitHub - vercel/satori: Enlightened library to convert HTML and CSS to SVG

Enlightened library to convert HTML and CSS to SVG - GitHub - vercel/satori: Enlightened library to convert HTML and…

Puppeteer | Puppeteer

Build status

GitHub - atsss/image_generator

Contribute to atsss/image_generator development by creating an account on GitHub.

GitHub - aws/serverless-application-model: The AWS Serverless Application Model (AWS SAM) transform…

The AWS Serverless Application Model (AWS SAM) transform is a AWS CloudFormation macro that transforms SAM templates…

Build Serverless APIs with Node.js and AWS Lambda | AppSignal Blog

Learn how to create and deploy Lambda functions with Node.js and AWS SAM in this introduction to AWS Lambda.

Node.js 20.x runtime now available in AWS Lambda | Amazon Web Services

This post is written by Pascal Vogel, Solutions Architect, and Andrea Amorosi, Senior Solutions Architect. You can now…

Building Node.js Lambda functions with esbuild

Use the sam build command to build an AWS SAM serverless application that uses TypeScript as a .zip file archive or a…

GitHub - Sparticuz/chromium: Chromium (x86-64) for Serverless Platforms

Chromium (x86-64) for Serverless Platforms. Contribute to Sparticuz/chromium development by creating an account on…

GitHub - Sparticuz/chromium: Chromium (x86-64) for Serverless Platforms

Chromium (x86-64) for Serverless Platforms. Contribute to Sparticuz/chromium development by creating an account on…

[BUG] chromium.font not working · Issue #207 · Sparticuz/chromium

Environment chromium Version: "@sparticuz/chromium": "119.0.2", puppeteer / puppeteer-core Version: "puppeteer-core"…

Written by Ats

Responses (1)