How to use AWS CodeBuild as a CI for your Python project

Published in

The Startup

8 min readApr 11, 2020

In a previous post, I was presented the trend started somewhere around 2017 aka DevOps. One of the things in this new mantra is the continuous integration and deployment CI/CD that allows devs to continuously test their code to ensure that everything is always up and running. The goal is to maintain a branch or maybe we can call it the trunk of your code repository that is always in a state ready to be deployed.

How can we ensure that our code is always deployable ? Well, we must continuously test it ! That’s exactly where tools like GitLab are very useful…. but there are very annoying limitations.

First, if you work on a huge project that runs hundreds of tests, it can be very slow. And secondly, if like me you need some reference files to process in your application, very big files, you need lots of computing power. Often, you are kicked out by the GitLab (or CircleCi or whatever tool you use) even before the tests finish, which is very frustrating. The only thing you can do is running tests on a local machine to ensure that those long lasting unit tests or the high demanding resource integration tests succeeded. That’s ok if there aren’t many devs, but it is not scalable. So what could be a solution ?

AWS CodeBuild

I like AWS, and whereas I am not an expert, I always find new services that fit my needs. Maybe Amazon devs already had the same issues ? Maybe I’m not alone ?

Whereas I could not find an exact solution for my usecase, I was able to build one and it works so well ! The idea was to use AWS CodeBuild, because

AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy.

That’s exactly what I need ! But how is it different from GitLab, CircleCi or Jenkins ? It seems that AWS CodeBuild also runs your code in a Docker container, so why does it suit better my needs ?

The answer can be found in the AWS “ecosystem”. AWS CodeBuild is indeed the test engine, but it can interact with all other AWS services, and that’s its power.

Reference files management

As I mentioned, I have long lasting tests and some integration tests that process huge data files. The solution with GitLab or similar services is to create our own Docker image and push it on Docker Hub. Fortunately, we can build our image based on the one provided by our test engine. Our custom image contains all the pre-built packages and our reference files. However, it is very annoying and time consuming to rebuild and push an image each time we update or add a reference file. And at the end, we still cannot perform all our tests because of resource limitations.

With AWS CodeBuild, we can also use your own Docker image. But whereas wcan use a powerful AWS EC2 instance to run the tests, we still do not fix the reference files management issue.

But here is the hack. With AWS CodeBuild, you can specify two code sources.

With AWS CodeBuild, you can specify two code sources.

All we have to do is using our source code repository as the main source, and a AWS S3 bucket as the second source.That way, all we have to do is updating the files directly in the bucket instead of building a new Docker image. Awesome !

Reporting with AWS CodeBuild

In a DevOps context, and especially in a regulated environment, reporting is crucial. We need to prove to auditors or the QA team that everything is running well and we have sane code ready to be deployed.

The AWS CodeBuild console provides us a rapid overview of the status of our tests, helping us to quickly spot and fix any bug before deployment.

AWS announced on late 2019 a new reporting feature in AWS CodeBuild. This feature allows to view the reports generated by our tests. The reports can be in the JUnit XML or Cucumber JSON format. The same way we can generate an HTML report with pytest-html, we can view metrics such as Pass Rate %, Test Run Duration, and number of Passed versus Failed/Error test cases in one location.

Test reports created by AWS CodeBuild are stored in Report Group, where they are stored for 30 days. We can also you can store the reports in an Amazon S3 bucket as archives. Each test report is further broken down by individual test cases.

In the Reports tab of the current test, we can find the reports generated by our tests. In the example above, two reports are generated, one for the test coverage and one for the unit and functional tests.

Each report in each reports group can be further expanded to spot if and which test failed.

In the example above, we see that our test run succeeded with 86 pass tests and 6 skipped, no fail test. Furthermore, we can expand each test case and look at the associated log if any.

Finally, we can view the artifacts that have been saved into AWS S3 in a bucket we specified during configuration time. Artifacts are stored there as long as we do not delete them from the bucket and ensure traceability for QA or regulatory audits.

AWS CLI to orchestrate the whole thing

Core of the project: the buildspec file

As I am somehow a lazy person, I don’t like to do the same thing twice especially when it can be automated or at least scripted. So first of all, here is a toy-project in python that we can use to test AWS CodeBuild.

dummy_project/
│
├── .buildspec.yml
├── setup.py
├── README.md
├── .gitignore
│
├── src/
│   └── __init__.py
│   └── app.py
│
└── tests/
    └── __init__.py
    └── test_app.py
    └── requirements.txt

Nothing exceptional here, just a tiny python project with a bunch of tests. The most important part of this toy project is the buildspec.yml file which details the commands to run a build.

A buildspec is a collection of build commands and related settings, in YAML format, that CodeBuild uses to run a build. Without a build spec, CodeBuild cannot successfully convert your build input into build output or locate the build output artifact in the build environment to upload to your output bucket.

Our toy project buildspec details as follow:

https://gist.github.com/gandroz/7e0414a7b557270d8480298e3775873b

Note that we did not specify here where to find the source code, that will be seen soon after.

The first part of the buildspec is the pre-build commands. These commands basically install all the needed packages to run tests. In the presented case, it consists in building a virtual environment and installing all the needed python packages and of course the package under test.

Next, we find the build commands. For our python toy project, the build command is just the call to the pytest command with the options

— html to generate a HTML report we can store as artifact
— self-contained-html to incude all the needed CSS and JS directly into the HTML report for offline reading
-s and -v to increase verbosity and segregate outputs during test
— cov to generate a coverage report
— cov-report to indicate where to store the coverage report
— junitxml to generate a JUnit report used by AWS CodeBuild to populate Reports in a Reports Group

Finally, we find the report section where we indicate the two CodeBuild Reports we want to generate, one for the test coverage and one for the test results. We’ll see how to tell AWS CodeBuild how to also store the artifacts into AWS S3.

Create AWS CodeBuild project with CLI

As I said, I like to use scripts to build things for repeatability purpose, and that’s where the AWS CLI comes into play. The CLI command is really simple

aws codebuild create-project [options]

I let you find out all the possible options in the documentation, but here is the JSON file I used to create the project

Build project creation configuration file

I won’t detail all the sections but I refer you to the documentation.

The important parts are the source sections. As said, there is a source for the source code (a bitbucket repositorty in my case) and a AWS S3 bucket as a second source where I put my reference files. Note that to allow AWS CodeBuild to connect to our repository, we first need to create a token as stated here.

As our tests are computing intensive, we need a large AWS EC2 instance defined in the environment section.

Finally, we specify the AWS S3 bucket where to store test artifacts and the group where to log in AWS CloudWatch. This is particularly useful to further trigger alerts or email notifications when tests fail.

Create the Report Groups

The same procedure is used to create Report Groups where we will store the test results and test coverage.

Report Group creation configuration file for Report

Report Group creation configuration file for Coverage

Gather CLI commands

Finally, all we need is gathering the AWS CLI commands in a bash script so that we can create our AWS CodeBuild project in one simple command. In the following script, I add commands to create the project, run a test and clean/delete the project.

Conclusion

In this post, we’ve seen how AWS CodeBuild could help us to run computing intensive tests using a large AWS EC2 instance, and use AWS S3 as a source of reference files we want to process.

We gave an example of a python toy project to support the demonstration. We also provided the needed steps to create such a project with the AWS CLI that allows us to script the creation for better reproducibility.

Finally, we propose how we can orchestrate and gather all these concepts in a simple bash script to create a project, run tests and clean the project in one command.