4 minutes
Memory corruption bugs ? In Python ???!
When I published Frelatage a year ago, I was often told: “Why fuzz python?, there is nothing interesting to find in this language”. Obviously, when people think of fuzzing, they think about memory corruption vulnerabilities, and therefore often assume that only codebases written in C/C++ can have this type of vulnerability. However, it should be noted that a significant number of python libraries have at least one part written in C, mostly for performance reasons.
Here are some examples of python modules written (at least partially) in C and which are therefore potentially vulnerable to memory corruption vulnerabilities:
- Pillow
- UltraJSON
- Most of the python builtin modules
Table of contents:
🏭 Create a demo vulnerable module
So we will start by writing a vulnerable module for Python in C. Here is a minimal application containing a buffer_overflow_vulnerable
method which as its name suggests is vulnerable to a buffer overflow. It does only one thing: copy an arbitrary entry into a 667 bytes buffer, causing a buffer overflow in case the entry exceeds 667 bytes.
#include <stdio.h>
#include <Python.h>
// A Method vulnerable to a buffer overflow.
static PyObject* buffer_overflow_vulnerable(PyObject *self, PyObject *args) {
const char* input;
if (!PyArg_ParseTuple(args, "s", &input)) {
return NULL;
}
/* /!\ Buffer overflow /!\ */
char buffer[667] = {0};
strcpy(buffer, input);
Py_RETURN_NONE;
}
// Method definition object for this extension
static PyMethodDef mymodule_methods[] = {
{
"buffer_overflow_vulnerable", buffer_overflow_vulnerable, METH_VARARGS,
"Method vulnerable to a buffer overflow."
},
{NULL, NULL, 0, NULL}
};
// Module definition
static struct PyModuleDef mymodule_definition = {
PyModuleDef_HEAD_INIT,
"mymodule",
"A Python module vulnerable to a buffer overflow, for demonstration purposes.",
-1,
mymodule_methods
};
// Module initialization
PyMODINIT_FUNC PyInit_mymodule(void) {
Py_Initialize();
return PyModule_Create(&mymodule_definition);
}
Then we create the setup.py
file, allowing us to use the method previously created in C inside of a Python module.
#!/usr/bin/env python3
# encoding: utf-8
from distutils.core import setup, Extension
mymodule = Extension('mymodule', sources = ['mymodule.c'])
setup(name='mymodule',
version='0.1.0',
description='A Python module vulnerable to a buffer overflow, for demonstration purposes.',
ext_modules=[mymodule])
And now it’s time to build our module
pip3 install .
We test this in a python interpreter, and we see that it works very well
>>> import mymodule
>>> mymodule.buffer_overflow_vulnerable("a")
Now comes the time to find this bug with the help of a Fuzzer 😈
🕵️ Finding the bug with a fuzzer
One of the easiest way to find this kind of vulnerabilities is obviously fuzzing, there are different fuzzers for Python, and today I choose to write the harness using Atheris.
Here is the harness
#!/usr/bin/env python3
import atheris
with atheris.instrument_imports():
import mymodule
import sys
def fuzz_mymodule(input_bytes):
fdp = atheris.FuzzedDataProvider(input_bytes)
data = fdp.ConsumeString(sys.maxsize)
try:
mymodule.buffer_overflow_vulnerable(data)
except:
return
atheris.Setup(sys.argv, fuzz_mymodule)
atheris.Fuzz()
I launch the fuzzer, and after a few very long seconds (3) of waiting, a crash occurs
*** buffer overflow detected ***: terminated
==6746== ERROR: libFuzzer: deadly signal
NOTE: libFuzzer has rudimentary signal handlers.
Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 4 InsertRepeatedBytes-InsertRepeatedBytes-CopyPart-CopyPart-; base unit: adc83b19e793491b1c6ea0fd8b46cd9f32e592fc
artifact_prefix='./'; Test unit written to ./crash-854298749b437ea327679792bf9cc9addf5876e6
We can see that the entry causing the method to crash is 830 bytes long, and therefore exceeds the 667 of our buffer
$ ~ wc ./crash-854298749b437ea327679792bf9cc9addf5876e6
1 0 830 ./crash-854298749b437ea327679792bf9cc9addf5876e6
🎉 Conclusion
So we have the proof that memory corruption vulnerabilities are not the privilege of C and C++, but are also present in Python.
To conclude, I would like to give you a small list of memory corruption vulnerabilities found in Python libs:
- CVE-2022-37454, a buffer overflow in the SHA-3 implementation
- CVE-2020-35653 is a heap buffer overflow that could occur when decoding a malicious PCX-format image with Pillow
- CVE-2021-45958 which is a buffer overflow in ultrajson
Now you know what you have to fuzz to find vulnerabilities on still little explored grounds !
happy vulnerability hunting to you 😊