Playing with Bun's URL parser

Feb 28, 2025

During the latest SECCON CTF quals, I ended up looking at Bun’s URL parser, hoping to find some weird behavior that could help me solve a challenge.

In particular, we were presented with this code:

const LOCALHOST = new URL(`http://localhost`);
 
const url = new URL(req.url, LOCALHOST);
 
if (url.hostname !== LOCALHOST.hostname) {
    res.send("Try harder 1");
    return;
}
if (url.protocol !== LOCALHOST.protocol) {
    res.send("Try harder 2");
    return;
}
 
// ... other chall details ...
 
res.send(await fetch(url).then((r) => r.text()));

After some digging in the source code, it turned out that in Bun the URL class uses WebKit’s URL parser, while fetch() uses a custom Bun one, implemented in the url.zig file. Given this information, inspired by Orange’s good old “A New Era of SSRF” talk, I was determined to find some parsing differentials to solve the challenge.

While this ended up leading to nothing, and in no way to me solving the challenge, I have found plenty of weird (and wrong) parsing behaviors in the custom parser. Sadly for me, but thankfully for everybody else, none of these are currently exploitable, or require a completely unreasonable threat model to be considered security issues. They resulted in a bunch of GitHub issues, though (#16181, #16182, #16183, #17435). Hopefully I haven’t ruined anybody’s future challenge :’).

Before we start

The reason these issues are not exploitable is that, prior to using the custom parser, Bun always^† uses the WebKit parser to normalize the URLs. Because of this, all but one parsing mistakes cannot be used in practice.

^†Actually, there is one case in which URLs are not normalized first, and that’s when parsing the values of the HTTP_PROXY and HTTPS_PROXY environment variables. These two URLs are subject to each one of these errors I identified, but requiring an attacker to run Bun processes with a custom environment is a pretty unreasonable scenario, in which URL parsing mistakes would probably be the least of the issues.

If you are interested in the setup I used to debug and test the parser, as well as my struggles with the Zig language, you can read more at the end of the post.

Auth user parsed as part of the host

This is the only issue that isn’t mitigated by WebKit’s normalization, as the URLs that cause it are already in a normalized format.

Let’s start looking at some Zig code!

When parsing a URL, after the scheme, we could encounter a : (colon) for two reasons: either it separates username and password in the authentication, or it separates the hostname from the port. Any other : should be URL-encoded, and has no special meaning in the URL.

To differentiate these two important cases, Bun takes the index of the first occurence of a : and the index of the first occurence of an @ (at), and uses them to identify the situation accordingly.

Let’s see some examples:

Example 1

example.com

Here there is no : nor @, so the values of first_at and first_colon fallback to 0, and the expression first_at > first_colon evaluates to false. Everything is parsed as the host.

Example 2

example.com:1337

Here there is a :, but no @, so the value of first_colon is greater than first_at, resulting in everything being parsed as the host.

Example 3

user:password@example.com

Here we have both a : and an @, and since the : is before the @, we parse the username and password before parsing the host. Everything is correct!

But having a password is totally optional, we could just have a username… What happens in that case?

Example 4

user@example.com

Since the : index fallbacks to 0, we are basically in the same example as before, username and password are parsed before the host, and the password parser correcly handles an empty string.

But what about… this?

Example 5

user@example.com:1337

Now there is a :, and this : follows the @ character, meaning that first_at > first_colon evaluates to false. The user parsing is skipped, and everything is parsed as the host. This result in a URL having hostname user@example.com and port 1337, with no auth.

Impact

The impact of this is very limited. While we cannot register a TLD containing an @, we can in theory have a custom DNS server responding for a subdomain containing an @, provided we can convince a client to resolve it. I was pretty sure I managed to do that with an entry on /etc/hosts, but I can’t seem to be able to reproduce it anymore, so I might be remembering wrong.

In any case, to trick a check like the one from the challenge, we would need to be in complete control of the name server associated with the subdomain of the whitelisted domain. Pretty hard.

IPv6 parsing

One interesting (and useless) extra point about this parsing mistake, is that technically we can have a hostname that is not a subdomain of the original domain. Looking at the parseHost() method:

If the host starts with a [ character, then we consider it an IPv6 address, and extract the hostname using the closing ]. This means that the following URL

[test]@example.com:1337

will be parsed as having hostname [test] and no auth.

This is useless, as we cannot use a real IPv6 address in the hostname, as it contains colons, and therefore would trigger the username and password parsing. In addition to this, the normalization process URL-encodes every square bracket in the username.

It’s still a “fun fact” :).

`#` is not used as a host delimiter like `/` and `?`

When parsing usernames, passwords, and hosts, the / and ? characters are used to recognize the end of the host, and the beginning of the path or query:

The issue is that the # character, used to start the fragment part of the URL, should be used as a delimiter as well. With the current implementation, a URL like http://localhost#example.com is parsed as a unique host localhost#example.com.

Impact

This has the same impact as the previous issue, with the difficulty of having to convince a DNS client to resolve a URL containing a # character.

In addition to that, the URL normalization turns http://localhost#example.com into http://localhost/#example.com, so the issue is mitigated in practice.

Someone might argue this is not an issue because of the normalization, and there is no need to add this check. The first counter-argument is that, since ? is used, and it gets the same normalization treatment, # should be added for consistency.

In addition to that, relying on the normalization is not necessarily a good approach, as it puts a requirement on the caller, and we cannot guarantee that it is performed.

`@` can be used as a username-password separator

Maybe you already spotted it in the previous code snippet, but the current implementation of the parser allows the @ character to be used as a separator between the username and the password, as well as between the password and the host. So http://user@password@example.com would result in a “Basic auth” with username user and password password to the host example.com. This is obviously not a standard behavior.

The reason this works is that there is always an attempt at parsing the password after parsing the username:

This happens even if the user parsing stopped after encountering a @ character, which would indicate the start of the hostname, so a password-less login.

Note that, while every other implementation of fetch() raises an exception when basic credentials are included, Bun accepts them, but ignores them silently (#17435). This is technically a violation of the spec. So even if this could go through the normalization process (which it doesn’t), it would still be useless.

Extra `@`s are pushed to the hostname

Expanding on the previous issue, what happens if we keep adding @ chars? We start pushing stuff into the hostname:

http://user@password@extra@example.com

This URL would be parsed as having username user, password password, and hostname extra@example.com. This has the same implications as the very first issue reported, but in this case the WebKit normalization breaks it completely.

”Debugging” Zig

Debugging and testing all of this is not that straight-forward. Given that the custom URL parser is not exposed, and always^† uses WebKit’s normalization, I had to write some Zig code to access it directly and test it.

First things first, we need to understand how to build the code for Bun. The contributing page includes a guide for installing all the dependencies, but I had some issues installing LLVM/Clang 18 on my system, so I created a Dockerfile for an environment ready to build Bun:

FROM ubuntu:22.04
 
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update
RUN apt-get install -y curl wget lsb-release software-properties-common \
    cargo ccache cmake git golang libtool ninja-build pkg-config rustc \
    ruby-full xz-utils
 
# CMAKE
WORKDIR /root
RUN apt-get install -y build-essential libssl-dev
RUN wget https://cmake.org/files/v3.29/cmake-3.29.2.tar.gz
RUN tar -xzvf cmake-3.29.2.tar.gz
WORKDIR /root/cmake-3.29.2
RUN bash bootstrap
RUN make -j$(nproc)
RUN make install
 
WORKDIR /root
 
RUN curl -fsSL https://bun.sh/install | bash
 
RUN wget https://apt.llvm.org/llvm.sh -O llvm.sh
RUN bash llvm.sh 16 all

If you want to use this, you can just run it mounting the directory with the Bun source code, and use it as normal with bun run build.

Hopefully this won’t desync too soon with the correct building steps and dependencies, but in case this doesn’t work, you can always check the contributing page for the updated instructions.

Now, what I wanted to do was to write a simple main.zig file, using the URL parser, and use it to test and manually fuzz it. The very first issue with this plan, is that I had no idea on what I needed to import, nor how to properly build my custom code.

So, the next idea was to take the already existing main.zig file, replace the main() function, and run bun run build, hoping that in that way I would have had everything I needed.

The issue is that Zig is a very annoying and, just like Go, it won’t let you have unused variables, switch cases must handle every possible value, etc… It’s far from easy when some of these unused variables are allocators, too, passed through a lot of functions, and you don’t know anything about them.

Build Summary: 2/5 steps succeeded; 1 failed
obj transitive failure
├─ zig build-obj bun-debug Debug x86_64-linux-gnu.2.27 1 errors
└─ install generated to bun-zig.o transitive failure
   └─ zig build-obj bun-debug Debug x86_64-linux-gnu.2.27 (+2 more reused dependencies)
src/cli.zig:1806:9: error: switch must handle all possibilities
        switch (tag) {
        ^~~~~~

Maybe I did something wrong, but it also felt very aggressive in trying to remove all unused code, failing to include some of the external C implementation. Every piece of Bun code I tried removing, translated in a build error. Maybe this isn’t even a Zig issue, and it comes from the way bun run build is configured, I have no idea.

ld.lld: error: undefined symbol: BakeProdLoad
>>> referenced by BakeGlobalObject.cpp:112 (/app/bun/build/debug/../../src/bake/BakeGlobalObject.cpp:112)
>>>               CMakeFiles/bun-debug.dir/src/bake/BakeGlobalObject.cpp.o:(Bake::bakeModuleLoaderFetch(JSC::JSGlobalObject*, JSC::JSModuleLoader*, JSC::JSValue, JSC::JSValue, JSC::JSValue))
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.
  cmake took 4.97 minutes
  Command exited: code 1

Adding another layer of frustration, the build process uses a lot of memory, but when it fills all my RAM and the kernel kills the process, it gives a compilation error that seems like either the code or its dependencies are broken. It took me a while to understand that it was just the OOM killer doing its job, and I didn’t break the source.

Given that all these attempts took quite some time to test, as the compilation process is not really fast on my laptop, in the end I decided to keep every little bit of the Bun code, and hijack an existing CLI command to run my tests. In particular, the bun discord command just prints a link, so I can add to its implementation the code I wanted to test:

pub fn testURL(u: string) void {
    const url = bun.URL.parse(u);
    std.debug.print("URL: {s}\n", .{u});
    std.debug.print("\tHost: {s}\n", .{url.displayHost()});
    std.debug.print("\tHostname: {s}\n", .{url.displayHostname()});
    std.debug.print("\tUsername: {s}\n", .{url.username});
    std.debug.print("\tPassword: {s}\n\n", .{url.password});
}

Yes, I know, putting random prints is not really debugging, but it’s the simplest way I found to see what was going on in the parser.

Playing with Bun's URL parser

Before we start

Auth user parsed as part of the host

Example 1

Example 2

Example 3

Example 4

Example 5

Impact

IPv6 parsing

# is not used as a host delimiter like / and ?

Impact

@ can be used as a username-password separator

Extra @s are pushed to the hostname

”Debugging” Zig

`#` is not used as a host delimiter like `/` and `?`

`@` can be used as a username-password separator

Extra `@`s are pushed to the hostname