An AWS Lambda Advent-ure in Python

30 November 2020

Let’s build a web-based, programming-related advent calendar using AWS Lambda, Python, Google Sheets and CDK. No chocolates I’m afraid, but along the way we’ll learn a bit about Serverless Python

For many of us, Advent is a time to look forward; a countdown to Christmas; 24 days in December at the end of the year that we mark with special calendars. Hiding behind little windows in a festive picture are sketches, quotes, or perhaps even chocolates! I've built a simple AWS Lambda application to publish a custom Advent calendar with programming videos behind the windows.

I've published the code for this on my Github profile so you can clone it and run your own calendar.

There are three areas I want to focus on with this exercise,

  1. Programming – I'm picking Python…
  2. Deployment – let's use AWS Lambda to keep the future days secret
  3. Infrastructure – we'll use the CDK to define our infrastructure in code

Our calendar needs a data source. Rather than having to deploy a CMS I'm using Google Sheets so there's no programming skills required to update the contents of our calendar.

Our calendar application will fetch this data and generate an html page from it.  You can see the published calendar here (apologies for the ugly URL).

If you want to simplify your deployment and avoid the Google Sheets integration the code also supports loading the data from a bundled CSV file (which makes it a little faster too).

Our Application

Let's step through the production code.

Since we're planning to use an AWS Lambda Function for this, we have a single entry point for our application. That's defined in functions/advent/handler.py as def handler(). It receives an event object and some context, both of which come from API Gateway in our solution.

Our handler does only 4 things. It

  1. fetches the calendar data from Google Sheets
  2. checks to see if it's in preview mode
  3. builds the calendar webpage if the request wants HTML or,
  4. returns the calendar data as JSON otherwise.

We call out to other modules that will do the work for us. One to fetch the Google Sheet data and the other to build the HTML page.  Our handler function only deals with Lambda specific details.

def handler(event, context):
    advent_data = get_sheet_data(parse_to_shape)

    show_all_dates = should_show_preview(event)

    if wants_html(event):
        return success_response(
            build_advent_calendar(advent_data, show_all=show_all_dates), "text/html"
        )

    return success_response(json.dumps(advent_data), "text/json")

get_sheet_data() wraps the Google Sheets API (which is available as a python module). That API requires credentials, which I've separately stored in AWS Secrets Manager. Fetching the credentials and creating the Google service takes a wee while, so the first run of our Lambda can take a couple of seconds. Subsequent runs are cached though, taking advantage of the warm start feature of the Lambda service.

def get_sheet_data(data_mapper):
    global simple_caching_sheet_data
    if cache_expired_or_empty():
        print("fetching sheet as cache expired")
        request = (
            service.spreadsheets()
            .values()
            .get(spreadsheetId=spreadsheet_id, range=range_)
        )
        simple_caching_sheet_data = request.execute()
    return data_mapper(simple_caching_sheet_data)

You'll see here that I'm injecting a mapper to convert the data returned from the sheet into a data structure I can use to build the calendar. This allows me to potentially re-use this module elsewhere with differently shaped spreadsheets. It also makes testing a little easier as I can decouple fetching from re-shaping.

A screenshot of our data source, the Google Sheet

Looking at the google_credentials module, there's some copypasta code from AWS for extracting the credentials from SecretsManager. fetch_credentials() checks the environment to see if it's being called from Lambda (on Lambda, the AWS_EXECUTION_ENV value is set). If so it uses SecretsManager otherwise it uses a local credentials file.

# inside fetch_credentials
    if os.environ.get("AWS_EXECUTION_ENV") is None:
        credentials = get_local_credentials()
    else:
        credentials = get_secret_credentials()
    return credentials

Moving on to the build_advent_calendar() code, I'm being somewhat unconventional by modern standards. Back in the good old days (for example 2010) we rendered HTML on the server and added behaviours through JavaScript in the client. Today, we'd do all that work in the client by sending a React (or Angular or Vue.js etc.) application to the browser and fetching our data as JSON from a JavaScript call. As an old guy, I've grown to value the principle of least power, particularly with regards web technologies, so here I'm not using any JavaScript to render my calendar. I'm even using a pure CSS solution to show and hide my window content. This should mean that my web page loads lightning fast (once the Google connection is warmed) and is more robust. There's less chance of a bug being introduced. If you haven't, I'd recommend reading
Jeremy Keith on this topic. I have had to add some JavaScript to support YouTube videos.

My “server-side” rendering is done using built-in features of Python, Templates and f-strings. If this system were to require more complex HTML or styling I would probably introduce a templating framework, but this only requires three separate components, the page itself, the list of windows and the list of panels. Here's a snippet that shows loading a template from a file and using Python's Template class.

def build_advent_calendar(advent_data, show_all):
    title, image, dates, *extras = advent_data
    with open(pathlib.Path(__file__).parent.absolute() / "calendar.html") as template:
        template_object = Template(template.read())
        html = template_object.substitute(
            CALENDAR_TITLE=title,
            BG_IMAGE=image,
            WINDOW_LIST=build_window_list(),
            PANEL_LIST=build_panel_list(dates, show_all=show_all),
        )
    return html

And here's a snippet of my code to build the windows

def build_window_list():
    boxes = [build_window(day) for day in range(1, 25)]
    return "".join(boxes)

Infrastructure

So that's our application. Let's look at the infrastructure to host it.

Our infrastructure is pretty simple. We have an AWS Lambda function that's hooked up to an API Gateway. We could have written the CloudFormation template for this in YAML pretty easily. But since we're Python developers, we'll write it using the CDK.

The AWS CDK supports TypeScript, JavaScript, Python, Java, and C#/.Net, but just about everyone who writes about it writes using TypeScript. I've used Python here, and the experience was a little less smooth because of the lack of TypeScript's strong typing and WebStorm's excellent support for it.

As an example, when I set up my integration between ApiGateway and Lambda, CDK expects the integration to be of type IHttpRouteIntegration. My integration is a LambdaProxyIntegration which is one of those, but PyCharm doesn't see it that way, giving me a warning that the types don't match.

api.add_routes(
    path="/",
    methods=[HttpMethod.GET],
    integration=(LambdaProxyIntegration(handler=advent_function)),
)

You do get some auto-complete features and parameter hints, but they aren't as helpful as what TypeScript and WebStorm would give you.

One other thing of interest in our CDK code; we're using context to inject variables (the Google Sheet Id and Range). If we were writing pure CloudFormation we'd probably do this using CfnParameters, but the developers of the CDK recommend against this approach. By using context we have access to the properties at synth time (rather than deploy time) so we can use the values in flow-control or other parts of our application.

We can provide context values in various ways like the cdk.json file or by command line parameters (see the CDK documentation for more details)

Deployment

Our next challenge is how to package our code and get it deployed into our infrastructure.

I am probably not doing this in the most “pythonic” way.

There are recommended tools for packaging, deploying and distributing Python applications. The most common distribution tool is setuptools. In this application, I have not used setuptools as there's no clear pathway for using it together with CDK to deploy to AWS Lambda.

Instead, I have used a simple Makefile to provide a number of build steps that allow me to deploy my application with a single command.

I'm using Pipenv to manage my dependencies. Pipenv has benefits on top of pip for deployment, as it allows me to differentiate between development dependencies (CDK and pytest packages for example) and production dependencies (for example the Google API client).

In my CDK code, I can specify a zip file as my source asset for my Lambda Function, and I can use my Makefile to build that package.

pipenv run pip install -r <(pipenv lock -r) --target _build
cp -R functions _build
${PYTHON} -m zipdir _build

My 'build' step installs my production dependencies into a build directory. Then it copies my function code into that directory. Finally it zips that directory. My CDK stack targets that zip file.

Putting that all together, by setting my google credentials into SecretsManager, creating a Google Sheet and adding its Id to cdk.json I can deploy my advent calendar by typing make build deploy in my terminal.

Step 1, release. Step 2, fix

As with any application there are improvements that could be made.

The Sheets API setup causes the Lambda function to be quite slow to run initially (though subsequent calls are much faster). I could give up the ease of editing by adding a CSV file to my lambda (in fact there's code for this already).

There's no DNS integration here, so that needs done separately. It's straightforward to add custom domains to API Gateway. Alternatively, an integration with CloudFront would let you add additional caching and integration with an existing website.

If you were so inclined, you could develop your own SPA web-app. The existing lambda code will work out-of-the-box with that approach, all you need to do is to ensure your HTTP request headers ask for JSON.

Need help? python -m instil course.py

If building Serverless applications with Python is something you need to do, Instil offers training in Introductory Python, Advanced Python and Python for Serverless. We'd be happy to discuss your specific needs. We deliver virtually to companies all over the world and are happy to customise our courses to tailor to your team’s level and specific needs. Come and check us out to see if we can help you and your team.

Article By
blog author

Ryan Adams

Software Trainer