The PyPI repository revealed about 5000 secrets left in the code and 8 malicious obfuscators

Brother · Nov 25, 2023

GitGuardian researchers have published the results of an analysis of confidential data forgotten by developers in code hosted in the Python package repository PyPI (Python Package Index). After studying more than 9.5 million files and 5 million package releases related to 450,000 projects, 56,866 confidential data leaks were revealed. If we take into account only unique data, without duplication in different releases, the number of detected leaks was 3,938, and the number of projects with at least one leak was 2,922.

In total, more than 150 types of confidential information leaks were identified, including regular passwords, cryptographic keys, access tokens to cloud services, continuous integration systems, and APIs. At least 768 credentials were still valid at the time of the study. Examples of popular leaks that remain relevant include Azure Active Directory access keys, SSH, MongoDB, MySQL, and PostgreSQL credentials, GitHub OAuth App, Dropbox, and Auth0 keys, and Coinbase and Twilio login parameters.

Among the types of leaks that are gaining popularity, Telegram bot access tokens are mentioned, the number of which doubled in early 2021 and then doubled again in the spring of 2023. A steady increase in leaks is also recorded from 2020 for access keys to the Google API, and from 2022 - for DBMS credentials. Among the packages leading in the number of leaks, the chatllm and safire packages are mentioned, in which 209 keys to OpenAI and 320 keys to Google Cloud were forgotten.

In addition to files with the extension ".py", the file types with the largest number of leaks are marked with the extension .json (610 leaks), .md (270), PKG-INFO (240), METADATA (210), .txt (170), as well as README files (209) and files from directories named test (675). A lot of leaks are also related to an oversight and errors with the configuration of file exclusion when forming packages. For example, files with local configuration files (. cookiecutterrc,. env, .pypirc, etc.) can be excluded from the Git repository via the ".gitignore", which is not taken into account when creating the package. In particular, 43 .pypirc files were found in the repository containing credentials for accessing PyPI. In 15 cases of leaks, developers did not plan to publicly publish packages originally created for internal use, but published them in PyPI by mistake.

Additionally, you can mention two other events related to PyPI:

• 8 malicious packages were identified in the PyPI repository, which are presented as utilities for obfuscation, i.e. rendering the code unreadable, which makes it difficult to restore the algorithm. The identified packages contained the string "pyobf" in their names (Pyobftoexe, Pyobfusfile, Pyobfexecute, Pyobfpremium, Pyobflight, Pyobfadvance, Pyobfuse, and pyobfgood) and were downloaded more than 2000 times.

The malicious code integrated into the packages was specific to the Windows platform and allowed you to connect to an external management server, run arbitrary commands on the developer's computer, find and send confidential information such as access keys to an external server, and transfer arbitrary files from the system. In addition, malicious code could perform keylogger functions, intercept passwords entered in Chrome, create screenshots, record audio, and even control a webcam.

* Published the results of an independent audit of the code base of the tools used to organize the work of the repository pypi.org, and the "cabotage" framework used in the container orchestration infrastructure. The audit was conducted with the support of the non-profit organization OTF (Open Technology Fund). The audit did not reveal any problems with a high level of danger, and the source texts were found to meet the basic requirements for safe code writing. At the same time, insufficient coverage of the cabotage code base tests was noted and 29 issues were identified, of which eight were assigned a moderate level of danger, 6 - low, and 14 were marked as informative comments.

Most notable issues:

* Insufficient verification of the digital signatures used to integrate PyPI with AWS SNS allowed notifications to be sent to individual users ' email addresses.

* An information leak in the download handler that allows you to determine the existence of an account without generating login attempt events.

* Use of unreliable cryptographic hashes that do not exclude cache poisoning attacks.

* If you have rights to run build processes via cabotage, an attacker could potentially get their commands to be substituted.

* If you have deployment rights in cabotage, an attacker could potentially deploy a legitimate-looking image.

The PyPI repository revealed about 5000 secrets left in the code and 8 malicious obfuscators

Brother

Professional

Similar threads