A tale of two command arguments

A common issue I see in platform abstraction libraries is complete ignorance of fundamental incompatibilities between platforms. Let's look at an example case study: the arguments to the main function in C and C++.

For starters, what text encoding do they use? Do you know off the top of your head? Better yet, where do they even come from? What is the purpose and source of the provided arguments? Do they come from a user? From the operating system? From a mix of the two? Do you really know what a locale is? Perhaps you think you know all these answers. There was once a time where I thought I did. Can you really be confident in your answers to these questions when you consider all platforms you hope to develop for?

Consider this: on desktop operating systems, program arguments are often used for passing filesystem paths. The NTFS filesystem used by Windows stores file and directory names as arbitrary sequences of 16-bit numbers. Aside from a few banned numbers and sequences with special meanings, almost any 16-bit number is allowed in any combination. Typically we expect these to be UTF-16 strings, but this does not have to be the case. Even case sensitivity can be enabled or disabled on a directory-by-directory basis on an NTFS volume.

So what do you think happens when someone on Windows double clicks a file associated with your program? What gets sent to the main function of your C++ app? Arbitrary 16-bit values do not fit neatly into arbitrary 8-bit values. Even if you use a different entry point like the "wide" main or the Windows main function, if you ever convert file paths to UTF-8 internally you will run into the problem of how to handle file or directory names that don't contain valid UTF-16. Do you store paths in UTF-8 configuration files? And what will you do when showing error messages, where you have to combine paths with human-readable text? Worse, what will you do when you need to make an input control that can work for both paths and human text? Did you know Linux allows newlines and backslashes in file and directory names?

Perhaps you decide that such names are not something you want to bother supporting, and you publish documentation for that stance. Microsoft seems to agree, as their new UTF-8 API support in Windows just continues on past conversion errors when a path name is not valid UTF-16, resulting in path not found errors. This ignorant behavior of the UTF-8 APIs is not documented though, so perhaps it may change in the future. Worth noting for comparison, in WSL they use the private-use region of Unicode code points to store parts of filenames that are valid on Linux but invalid on Windows when you view them through the projected filesystem, so a similar approach might be possible with the UTF-8 APIs.

Anyway, say you've documented that you don't support path names that contain invalid UTF-16 and you're using the new fancy Windows functionality that lets you do everything in UTF-8 thanks to automatic conversions in the API managed by Microsoft. Great! Except, what if your software is subject to security considerations? If you ever try to self-execute - that is, start a new instance of the same executable that is currently running - you might be in for a rude surprise. Try giving your app executable a name containing invalid UTF-16, and then create an unrelated executable with the result of converting that name to UTF-8 and then back to UTF-16 again, and place them in the same directory. Guess which one will get executed when your app tries to invoke itself? It's a mystery, for it depends on where you pulled your executable name from (for example, the first argument to the main function), and how you did so, as well as if you converted it along the way. The security implications of this are left as an exercise to the reader. Almost makes you want the horror that is `fork` from Unix systems. Though don't think Linux is immune to this type of issue either, some filesystems used on Linux perform Unicode normalization, others don't, and some collate but don't normalize - better hope your configuration files with paths aren't being normalized by some intermediate step, as I've seen in the wild.

Things get worse. As it turns out, Windows does not actually have a concept of separate arguments being passed to programs. The actual interfaces only allow passing a single string of arbitrary data dubbed the "command line", and the target app is free to parse or interpret that single string in any way it sees fit. The C Runtime that ships with Visual Studio does some default parsing in order to support the standards-compliant main function arguments, but if you're using the Windows main entry point you can pass a linker parameter to disable the overhead of that unnecessary parsing, and in either case you can always obtain the original command line string and do your own parsing. Many apps do their own parsing instead of relying on the C Runtime's default behavior. For all anyone cares, the command line could be a JSON document or binary data that happens to not contain a zero. Even consecutive unquoted spaces are preserved and could alter behavior.

Linux, on the other hand, specifically separates arguments and keeps them separate, with no single command line, meaning that each individual shell/terminal is responsible for parsing a command line into separate arguments and otherwise programs don't get to do any manual interpretation of the command line. This is a fundamental incompatibility between Windows and Linux that makes me cautious about any platform abstraction library that provides separated arguments for all platforms. It might work well enough for most cases, but one day someone might have an issue they can't understand the cause of because they have no idea that Windows does not pass arguments separately.

For example, there was a recent CVE related to programming languages improperly escaping arguments when executing a batch file, and the underlying problem is the attempt to abstract something in a way that does not make sense: Windows apps can each have their own method of escaping arguments (or even no escape support at all) since there's no such thing as separate arguments at the OS level, so there is no generic solution to escaping arguments when executing programs on Windows. It has to be considered separately for each program that is executed, and in the case of the CVE, it was the cmd.exe command processor expecting different escaping behavior for arguments. Existing libraries with this design flaw now have to play a game of fixing each newly-discovered improper argument escape for each new program that expects arguments to be escaped in its own special way, or (ideally) clearly documenting that their argument escaping only works for specific usage cases and has the potential to be wrong.

Things continue getting worse the deeper we go. Do you think the concept of arguments being passed to your entry point applies to game consoles? And if so, how? Is the Steam Deck considered a game console? Because it certainly provides an experience like one, and yet it allows players to freely edit the command line arguments that games are launched with, while also simultaneously supporting games built for Windows and for Linux. But no such ability to edit launch arguments exists on other popular game consoles, at least not as far as retail users can see. Do other game consoles use command arguments for some other purpose (such as development), or not at all? Is your platform abstraction library going to make game devs happy or frustrated?

The strength of a platform abstraction library is its ability to quickly get you up and running on a new platform, but you're likely going to still need to special case a bunch of things, and the measure of a platform abstraction library is how well it eases that transition from abstracted to specialized code. I often find myself annoyed when a library provides no recourse for wanting to customize or replace behavior, short of editing the library code itself. It's also annoying when documentation doesn't clearly explain how things are actually abstracted for each platform and how you might want to make refinements for each platform. When the expectation is that refinements aren't necessary, the result is that refinements are impossible.

A good platform abstraction library would provide multiple levels of abstraction, ranging from exactly matching the underlying platform's way of doing things to a generic approach that lets you write code that works for all platforms. Back to our command arguments case study, this could mean the generic approach is the naive approach, relying on the C Runtime's argument parsing and UTF-8 conversion on Windows, and then guarded code could be allowed to access the wide character argument list (also prepared by the C Runtime), as well as the original raw command line in both wide and narrow flavors. This makes it easy to quickly and/or gradually transition from code that isn't behaving as expected to code that has a better understanding of reality without having to dig through source files and documentation: the library eases the process of coding closer to the platform when needed, without requiring the added effort of going all the way.

Full ignorance of the nuances of each platform can let developers fall into pits and get trapped without a ladder out, forcing them to dig out by hand. However, you could argue there is a small benefit to a platform abstraction that traps the developer: it results in less code to maintain, since there is no possibility to have special cases for certain platforms. However, I'd argue that this sort of thing is better addressed in project coding standards instead, since those can flex more easily than complex platform abstraction libraries. When there's a critical bug on your plate, would you rather ask permission to fix it with ease, or figure out how to modify or sidestep the platform abstraction library you're now stuck with?

If you're tired of the command arguments case study, let's look at something more gaming-focused. SDL is a popular platform abstraction library commonly used by games and other applications, and one of its features is managing native platform windowing on desktop-like platforms. For years across two major versions, it has had a bug on Windows where if you click and hold on the title bar, or start resizing the window, the game freezes. This is because they made a design mistake of running the window event pump on the same thread as the game logic and rendering. On Windows, dragging and resizing a window results in recursive message loops in the event pump, so the top-level event loop is effectively blocked until the move or resize is done. Blame Windows all you like, this behavior was known and documented, and ignored by SDL for one reason or another. It looks like SDL3 is making considerable changes to the library design so that the window event pump can run in its own separate thread finally, as it should have been from the start.

This particular oversight in how the Windows window event pump works is not restricted to SDL. Minecraft's Java Edition also suffers from the same issue, and it even allows for exploit-like behavior in multiplayer contexts. Try this: connect two Minecraft Java Edition clients, each on a separate device, to the same server. Have client A press and hold the mouse on the window title, and don't move the mouse. Then have client B create an explosion below client A. You might notice client A is floating above the hole from client B's perspective. Now, release the mouse from the window title bar, and client A will fly upward as it processes the explosion quite a bit after it actually happened. Same bug as SDL, now making it trivial for players to create unexpected outcomes in Minecraft multiplayer. (In case you're wondering, Minecraft's Bedrock Edition is a UWP app, so it doesn't have this issue due to the UWP message loop working differently.)

In both cases, a platform abstraction library made a poor design decision that didn't account for a quirk of a particular platform, and resulted in clearly and obviously wrong behavior from the perspective of the user. The fact it's the same issue in both cases lends weight to the idea that the Win32 API is terrible, and I don't disagree. As I mentioned in a parenthetical, the Universal Windows Platform's message loop works differently and doesn't have the quirk that the Win32 message loop does, so clearly Microsoft realized there was something undesirable about the old approach and their redesign avoided it. Anyway, let's look at how SDL handled the situation prior to SDL3.

For a long time, SDL had no real workaround for the design flaw. Eventually, they added official support for adding a callback that gets called during the window move and resize loop, where you can run game logic and render the game. This sort of works, since it allows the app to continue updating while being moved or resized, but it also bogs down the window message loop, which can be directly felt by the mouse cursor not being as responsive to mouse inputs during the window move/resize. Remember what I said earlier about enabling gradual transitions from incorrect behavior to correct behavior? This is a good example of that, where we're somewhere in the middle between incorrect and correct. Well, it's still incorrect, but it at least sort of alleviates the major side effects, and with only minimal code changes. The full proper fix for fully correct behavior in SDL3 will require more significant code changes.

It's interesting to me how platform abstraction libraries both save coding time and also multiply it. The abstractions they provide make it quick and easy to get up and running with basic functionality on a variety of platforms, but here we've also seen two separate libraries make the same mistake that now has to be fixed in two separate ways despite the underlying platform being the same. Had you coded for the native Windows API directly and made the same mistake yourself, you would have had a direct path to fixing it, but using a platform abstraction library means trouble with backward compatibility delaying the fix. Such is the nature of tradeoffs, I suppose.

As with all tradeoffs, I leave it to your own best judgement to decide what abstractions are or aren't worth using for your use cases. I'm not arguing against platform abstraction libraries, and I don't blame anyone in particular for the issues I've mentioned above, I just find it fascinating to dive into these nuances that are often overlooked in the pursuit of uniformity. Hopefully I haven't scarred anyone with eldritch platform knowledge that curses their mind to forever be paranoid. Text encoding and networking can each be nightmarish on their own, after all.

Comments