pip install malware — Jacob Green

Last month, a researcher’s machine randomly crashed after he installed a Cursor MCP plugin - annoying. He investigated, and discovered his machine had been running a credential-harvesting script on every Python startup: quietly collecting SSH keys, AWS tokens, Kubernetes secrets, .env files, and anything else it could find, encrypting the lot, and posting it to an attacker-controlled server. The culprit was a poisoned version of LiteLLM (a dead-simple AI library downloaded 3.4 million times per day that provides a common interface for various LLM APIs), which had been compromised via a supply chain attack. The attackers claim to have exfiltrated 300GB of credentials before it was caught. I won’t rehash the full story again - you can read about it here.

This made me think, if you’d been using a lockfile, you’d almost certainly have been fine. For recommended actions, skip to the end. Otherwise, enjoy something akin to those recipe blogs that spend 4,000 words explaining how their grandmother discovered rosemary before revealing it’s just a roast chicken recipe.

Attack

In brief, LiteLLM used Trivy, an open-source security scanner, as part of its CI/CD pipeline (kind of sad that this was the case, because I’m a big fan of Trivy - it also meant this hit those best prepared the hardest). Trivy itself had been compromised a few days earlier by a threat actor group called TeamPCP, who rewrote its GitHub Action tags to point to a malicious release. Because LiteLLM was pulling Trivy without a pinned version, the poisoned one ran, stole LiteLLM’s PyPI publishing credentials from the GitHub Actions environment, and handed them to the attackers.

With those credentials, TeamPCP pushed two malicious versions of LiteLLM to PyPI on 24 March 2026. The malware was nasty: a three-stage credential henry (hoover) that fired on every Python startup - not just when you ran your app, but when you ran pip, opened your IDE, anything. This was thanks to a Python .pth file¹ dropped into site-packages on install. It collected SSH keys, cloud credentials, Kubernetes tokens, .env files, shell history, encrypted everything, and exfiltrated it to a lookalike domain registered the day before AND installed a persistent backdoor for good measure.

It was caught within about three hours, largely because the payload had a bug that accidentally fork-bombed the researcher’s machine. Thank goodness for lazy hackers², but three hours at 3.4 million downloads a day is still a lot of machines.

Lockfiles

The first time I heard the term “virtual environment” I was sat in front of a robot arm in the robotics lab, and my lab partner (shout out @chris) was setting up our Python interface for sending control signals to the arm. He said something like “let’s just use Python’s built-in venv” and I didn’t have a clue what he was talking about. All I knew was that if I ran source .venv/bin/activate, I could actually run things. I built quite a lot in Python before really understanding virtual environments, dependencies, package managers, any of it; I just pip-installed left right and centre into my base Python environment, not a care in the world. It wasn’t until my first job that I spent the time actually understanding what the hell was going on (largely thanks to some great mentors there!).³

My team analysed large amounts of robot telemetry data, surgery metadata, human-written reports, and a ton of other stuff to inform engineering decisions, and some pretty sizeable business decisions were made off the back of our analyses. So we had rigorous reviewing and auditing processes in place. Quick reminder of why virtual environments actually matter:

Software changes over time: functionality gets added, removed, and modified. Different projects need different versions of the same dependencies.
Every package you install adds overhead: latency, disk space, and attack surface. Only bring in what you actually need.
Isolation: Virtual environments let you specify an isolated, per-project set of dependencies - keeping everything clean and separate.

Our team used pipenv as a package manager, which generated a lockfile - basically version pinning on steroids, and committed to version control.

A lockfile specifies not just a version range but a single exact version, plus a cryptographic hash of the exact build artifact for each platform. When you install from a lockfile, your package manager downloads the package and checks its hash against what’s recorded, and if they don’t match, the install fails. This means two things: firstly, you’re guaranteed the exact same environment every time, regardless of hardware or virtualisation. Secondly, even if an attacker manages to publish a malicious version of a package you depend on, the lockfile won’t let it in. It’s pinned to a specific version and a specific hash - a poisoned 1.82.7 with different contents would have a different hash, and the install would refuse it outright.

This mattered for two reasons. First, we could rerun the same analysis months later and be confident we’d get the same results (we had data versioning in place too). Second, we could run analyses on a schedule and trust that results were comparable over time.

Why does the exact version matter that much? Say you’re running a rolling window on some joint torque telemetry; let’s imagine that in pandas 1.2.3, df.col.rolling() (a windowing function) is backward-looking - it averages the preceding n rows, but that in 1.3.0, the default changes to forward-looking, and your results silently flip. These are the kinds of differences that are almost impossible to spot, and that can quietly invalidate months of work. And they bite you if you had either no pinning at all - a requirements.txt that just says pandas - or loose pinning like pandas~=1.0.0. Not fun, and a total waste of your time (or tokens) to spend hours scrutinising every line of code to figure out what changed.

Window size 3 periods

e.g. pandas 1.2.3 (backward-looking

peak smoothed torque occurs at

—

e.g. pandas 1.3.0 (forward-looking)

peak smoothed torque occurs at

—

raw data backward (1.2.3) forward (1.3.0)

Same data, same function call, different pandas version. Different conclusion about when peak torque occurs, and the gap grows the larger your window size.

Pipenv solved this, but imperfectly: resolving all the locked dependencies got frustratingly slow as the list grew (dependency resolution is an NP-hard problem), so you’d sometimes sit there for minutes just waiting for your environment to build. We moved to Poetry, which had a nicer API and resolved a bit faster, but it had its own issue: at the time it only played nicely with pure Python libraries. Not great when you’re relying on torch or tensorflow, which drag in C dependencies, CUDA drivers, and all manner of fun things underneath.

Around that time I changed jobs. The new team was using conda - great for research, quick to spin up environments, and it handled complex torch dependencies well. But the versioning was loose, which was fine for experimentation but not for shipping anything. When we started moving research into production: running on different OSes, hardware configs, Docker images, I knew we needed to lock properly. I tried poetry again (still didn’t play nicely with CUDA), conda-lock with pip on top (slow, heavyweight, and they didn’t resolve each other’s dependencies), before eventually landing on Pixi. It handled both Python and non-Python dependencies, had a proper lockfile, and had an ambitious long-term vision of replacing Docker entirely by fully speccing out hardware-specific environments. An extremely cool project.

A few months in, pixi migrated their Python resolver to uv under the hood - I didn’t pay much attention at the time. But then a colleague suggested we move our Python management entirely to uv, and I was sceptical. Could it really handle CUDA-enabled torch, with all its annoyingly named versions (1.3.1+cu118 and friends)? Turns out the Python wheel ecosystem had quietly matured⁴, and uv handled all of it - lightning fast⁵, proper lockfile, consistent across hardware and virtualisation environments. It’s since added workspaces too, so you can share locked dependencies across multiple services in a monorepo, which killed a whole class of gnarly serialisation bugs we’d been wrestling with.⁶

CI/CD is often the forgotten child when it comes to this stuff - I’ve seen teams put real effort into keeping dev and prod consistent, then just run pip install -r requirements.txt in CI with no pinning, different environment, potential security exposure, and when tests fail due to dependency mismatches, it’s a nightmare to debug. Your lockfile needs to cover CI/CD too. Leave no stone unturned!

Fix

For LiteLLM, if any of the affected projects had been using a lockfile (properly, everywhere) they’d have been fine. A lockfile pins you to an exact version and hash, so when TeamPCP pushed 1.82.7 and 1.82.8, a locked environment simply wouldn’t have moved: no automatic upgrade = no poisoned package.

Worth noting that this only holds if you actually have a committed uv.lock. A pyproject.toml with litellm>=1.80.0 and no lockfile is just as exposed as a bare pip install litellm — running uv sync or uv run will happily resolve to whatever’s latest on PyPI, poisoned or not. A lot of the projects caught out weren’t doing anything obviously wrong; they just hadn’t committed a lockfile, so when their CI ran, uv did exactly what it was told and fetched the freshest compatible version.

The attack spread the way it did because of unpinned dependencies: in people’s local environments, in CI/CD pipelines, Docker builds, transitive dependencies in other packages. DSPy, MLflow, CrewAI all filed emergency PRs on the same day. This is exactly the scenario I described earlier: teams who’d been careful about their dev and prod environments, caught out because CI was just running pip install litellm with no lockfile in sight.

It’s not just Python

The same week LiteLLM was hit, axios (the most popular JavaScript HTTP library, >100 million weekly downloads) was compromised in an almost identical attack⁷.

The fix is the same across ecosystems.

For JS/TS: package-lock.json (npm), yarn.lock, or pnpm-lock.yaml - and npm ci rather than npm install in CI, which is the npm equivalent of uv sync --frozen.
For Rust: Cargo.lock is committed by convention for binaries and Cargo respects it automatically.
For Go: go.sum provides cryptographic checksums of every module version, with an additional checksum database (sum.golang.org) as a second layer.
For C/C++: it’s a mess…

The pattern across all of these attacks is the same: trusted package, compromised maintainer credentials, malicious version pushed, credential exfiltration runs silently, but the defence is the same too: lock your dependencies, enforce the lockfile everywhere, and don’t let your build system pull “latest” unattended.

What to do today (Python edition)

Switch to uv if you haven’t yet: it’s incredibly fast, handles complex dependencies including CUDA-enabled libraries, and generates a proper lockfile. It’s one of those quality of life things that’s hard to describe until you switch.

Use uv add to add packages: never write versions manually into pyproject.toml, and ban pip install and uv pip install in your projects. Instruct your coding agents to do the same - I’ve put together a starter repo with some sensible Python defaults you can drop straight into your project, or here’s how to set it up manually for Claude Code, Codex and Cursor:

Claude Code - two files: ~/.claude/CLAUDE.md for instructions, and ~/.claude/settings.json for hard enforcement. The settings.json block actively prevents the commands from running even if the model forgets its instructions.

~/.claude/CLAUDE.md:

## Python
- Never use `pip`, always use `uv`, never use `uv pip`
- Add packages with `uv add` - don't write versions directly into `pyproject.toml`

~/.claude/settings.json:

{
  "permissions": {
    "deny": [
      "Bash(pip install*)",
      "Bash(pip3 install*)",
      "Bash(python -m pip install*)",
      "Bash(python3 -m pip install*)",
      "Bash(uv pip install*)"
    ]
  }
}

Codex - similar to Claude Code, enforce as a global instruction, and enforce using rules:

~/.codex/AGENTS.md:

## Python
- Never use `pip`, always use `uv`, never use `uv pip`
- Add packages with `uv add` - don't write versions directly into `pyproject.toml`

~/.codex/rules/default.rules:

prefix_rule(
    pattern = ["pip", "install"],
    decision = "forbidden",
    justification = "Use `uv add` instead",
)
prefix_rule(
    pattern = ["pip3", "install"],
    decision = "forbidden",
    justification = "Use `uv add` instead",
)
prefix_rule(
    pattern = ["uv", "pip", "install"],
    decision = "forbidden",
    justification = "Use `uv add` instead",
)

Cursor - the weakest of the three. There’s a denylist in settings but it has documented bypass issues (obfuscated commands and shell scripts evade it). The more robust option since v1.7 is Hooks (beforeShellExecution), though the docs are still sparse. At minimum, add to your global .cursor/rules/python.mdc:

Always use uv for Python package management. Never use pip install, pip3 install, or uv pip install. Add packages with uv add only.

Cursor will usually comply, but unlike Claude Code and Codex, there’s no hard block.

Commit your uv.lock file to version control.
Make sure your lockfile is being used in every environment: local dev, remote dev machines, CI/CD, staging, prod. Also, importantly, make sure you’re running uv sync --frozen rather than bare uv sync in your pipeline: --frozen will error if the lockfile is missing or out of sync with pyproject.toml, rather than silently resolving fresh dependencies. You want your CI to fail loudly if someone forgot to commit an updated lockfile, not quietly pull whatever’s latest on PyPI.

Am I compromised? How to check.

If you installed or upgraded LiteLLM via pip on 24 March 2026 between 10:39 and 16:00 UTC, check the following (via Snyk and LiteLLM’s own incident report):

# Check your installed version
pip show litellm | grep Version
# If 1.82.7 or 1.82.8 - treat the system as compromised

# Check for the persistent backdoor
ls ~/.config/sysmon/sysmon.py 2>/dev/null && echo "BACKDOOR FOUND"
ls ~/.config/systemd/user/sysmon.service 2>/dev/null && echo "PERSISTENCE FOUND"

# Check for the .pth file
find $(python3 -c "import site; print(' '.join(site.getsitepackages()))") \
  -name "litellm_init.pth"

If anything comes back positive: rotate all credentials on that machine: SSH keys, AWS/GCP/Azure keys, API tokens, anything in .env files or shell history. Then check your Kubernetes cluster for pods named node-setup-*. Full remediation steps are in LiteLLM’s security update.

Going further: Dependabot, delayed installs, and staying on top of vulnerabilities.

A lockfile protects you from surprise upgrades, but it doesn’t automatically protect you when a vulnerability is discovered in a version you’ve already pinned. For that, you want tooling that monitors your locked dependencies and flags when a CVE lands: dependabot (GitHub) and uv’s built-in audit command are both good options.

One underused trick: both uv and npm let you configure a minimum package age before installation, so you’d only install packages that have been live on PyPI for, say, 3 days. The LiteLLM malicious versions were live for ~3 hours. That alone would have saved a lot of people.

Development security can be dull work, but it’s moments like this you realise how vulnerable to human error all of these things are - make your life easier by making it harder (to screw up the little things) - the unglamorous stuff ends up being the thing that actually matters.

The robot arm never got compromised. Partially because it predated half of this tooling, but mostly because @chris knew what he was doing. Be more @chris.

I hope you enjoyed reading - comments, concerns, corrections and strong disagreements all welcome!

Thanks to Ben Harrison, Janosch Menke, Mustafa Al-Quraishi & Chris Parsons for their feedback on early versions of this article

Not the PyTorch model checkpoint kind - I made that mistake too - but a much more sinister Python interpreter startup hook that fires before your code runs, before any imports. ↩
Andrej Karpathy publicly speculated it was vibe-coded. ↩
It was also one of these mentors that taught me about typosquatting - where malicious packages are published under names to widely used ones but with typos. Worth a listen/read if you haven’t come across it before. Who knew typos could be so lethal… ↩
In large part due to the manylinux wheel standard. ↩
It’s written in Rust, duh. ↩
Astral, the company behind uv (and ruff), was acquired by OpenAI in early 2025. ↩
Maintainer credentials stolen, malicious versions pushed to npm, a backdoor deployed across Windows, macOS and Linux, read about it here. ↩

Attack

Lockfiles

Fix

It’s not just Python

What to do today (Python edition)

Footnotes