Unpatched 15-year old Python bug allows code execution in 350k projects

Unfixed 15-year-old Python bug allows code to run in 350,000 projects

A vulnerability in the Python programming language that was neglected for 15 years is now back in the spotlight as it likely affects over 350,000 open source repositories and can lead to code execution.

Disclosed in 2007 and tagged CVE-2007-4559, the security issue never received a fix, with the only mitigation provided being a documentation update warning developers of the risk.

Not patched since 2007

The vulnerability is in Python .tar file package, in code that uses un-sanitized tarfile.extract() function or the built-in defaults of tarfile.extractall(). This is a path traversal bug that allows an attacker to overwrite arbitrary files.

Technical details of CVE-2007-4559 have been available since the initial report in August 2007. Although there are no reports of the bug being used in the attacks, it represents a risk in the software supply chain.

Earlier this year, while investigating another security issue, CVE-2007-4559 was rediscovered by a researcher from Trellix, a new company providing extended detection and response (XDR) solutions resulting from the merger from McAfee Enterprise and FireEye.

“Failing to write security code to clean up member files before calling tarfile.extract() tarfile.extractall() results in a directory traversal vulnerability, allowing a malicious actor to gain access to the system of files” – Charles McFarland, Vulnerability Researcher at Trellix’s Advanced Threat Research Team

The flaw stems from the fact that the code in the extract function in Python .tar file the module explicitly trusts the information in the TarInfo object “and appends the path that is passed to the fetch function and the name in the TarInfo object”

CVE-2007-4559 - path joining filename
CVE-2007-4559 – path joining filename
source: Trellix

Less than a week after the disclosure, a post on the Python bug tracker announced that the issue was closed, with the patch updating the documentation with a warning “that it may be dangerous to extract archives from untrusted sources “.

Estimate of 350,000 projects impacted

Analyzing the impact, Trellix researchers found that the vulnerability was present in thousands of software projects, both open source and closed.

The researchers retrieved a set of 257 repositories most likely to include the vulnerable code and manually checked 175 of them to see if they were affected. This revealed that 61% of them were vulnerable.

Running an automated check on the rest of the repositories increased the number of impacted projects to 65%, indicating a widespread issue.

However, the small set of samples only served as a benchmark to arrive at an estimate of all impacted repositories available on GitHub.

“With the help of GitHub, we were able to get a much larger dataset to include 588,840 unique repositories that include ‘import tarfile’ in its python code” – Charles McFarland

Using the manually verified vulnerability rate of 61%, Trellix estimates that there are over 350,000 vulnerable repositories, many of which are used by machine learning tools (e.g. GitHub Copilot) that help developers complete a project faster.

These automated tools draw on code from hundreds of thousands of repositories to provide “auto-completion” options. If they provide insecure code, the problem spreads to other projects without the developer knowing.

GitHub Copilot suggests vulnerable tar file extraction code
GitHub Copilot suggesting vulnerable .tar file checkout code
source: Trellix

Digging deeper into the problem, Trellix discovered that the open source code vulnerable to CVE-2007-4559 “spans a huge number of industries”.

As expected, the most impacted sector is the development sector, followed by web and machine learning technologies.

Code vulnerable to CVE-2007-4559 present in all sectors
Code vulnerable to CVE-2007-4559 present in all sectors
source: Trellix

Exploiting CVE-2007-4559

In a technical blog post today, Trellix vulnerability researcher Kasimir Schulz, who rediscovered the bug, outlined the simple steps to exploit CVE-2007-4559 in the Windows version of Spyder IDE, a development environment open source cross-platform integrated for scientific programming. .


The researchers showed that the vulnerability could also be exploited on Linux. They managed to trace the writing of the file and to perform the execution of the code during a test on the management service of the IT infrastructure of Polemarch.

In addition to drawing attention to the vulnerability and the risk it poses, Trellix also created patches for just over 11,000 projects. Fixes will be available in a fork of the impacted repository. Later they will be added to the main project via pull requests.

Due to the large number of repositories affected, researchers expect over 70,000 projects to receive a fix in the coming weeks. Reaching the 100% mark is a difficult challenge, as merge requests must also be accepted by managers.

BleepingComputer has contacted Python Software Foundation for comment on CVE-2007-4559 but has not received a response at press time.

#Unfixed #15yearold #Python #bug #code #run #projects

Leave a Comment

Your email address will not be published.