Piping Portable PixMaps

2024-01-27 (Last edited 2024-03-06)
software

TL;DR: You can pipe PNM images, directly into FFmpeg.

I enjoy the Advent of Code (AoC) puzzles1, but I usually can’t keep up in real-time in December. Instead I like to come back to them whenever I want a break and try to learn new programming languages, data structures, and patterns of programming.

Many of the problems are grid-based, with ascii-art diagrams, like Day 18 of 2018:

The lumber collection area is 50 acres by 50 acres; each acre can be either open ground (.), trees (|), or a lumberyard (#) … For example, suppose the lumber collection area is instead only 10 by 10 acres with this initial configuration:

.#.#...|#.
.....#|##|
.|..|...#.
..|#.....#
#.#|||#|#|
...#.||...
.|....|...
||...#|.#|
|.||||..|.
...#.|..|.

I usually try to match these diagrams for quick validation with the examples and to help debug my solutions. Visualizing state is really helpful! It can be much easier to spot something unexpected visually than comparing numbers and values in a debugger or dump. katef has some wonderful posts about quickly visualizing data and software states with diagrams and plots: “FAQ about displaying stuff in a terminal” and “Make yours and everybody else’s lives slightly less terrible by having all your programs print out their internal stuff as pictures”

Unfortunately the ascii format has its limits: the puzzle inputs can be hundreds of cells wide and take thousands of iterations, too large for a terminal to meaningfully display and manipulate. Fortunately, there’s a common format for dense grid-based data: images, and over time, videos! Unfortunately, most image and video formats are pretty complex, and the software libraries for working with them are too. There are plenty of times where I don’t want to deal with pulling in and learning an image library, if one even exists for the language I’m using.

For those cases I’ve found a quick way to create videos with around 20 lines of straightforward code, no intermediate files, and minimal FFmpeg flags. Using it is as simple as running:

./day18ppm.ml | ffmpeg -i - /tmp/day18.webm

to create output like this:

Iteration of my solution for 2018, Day 18's cellular automata. The green, brown, and black pixels represent forest, lumberyard, and empty acres respectively.
The example from 2018, Day 17.
My solution for 2018, Day 17 (You can double-tap or right-click to open in fullscreen)

This method relies on a couple of things:

  • The wondrous Unix pipe, which makes it easy to pass data between specialized programs.
  • PPM (Portable PixMap), “a lowest common denominator color image file format” that is simple to write and widely supported.
  • FFmpeg and its image2pipe decoder, which automatically detects a stream of images.
  • mpv, a video player that supports frame-by-frame stepping, pixel-perfect zooming, and panning.

Portable PixMaps

The Netpbm project defines several related image formats and over 350 tools to manipulate them and convert to and from other image formats.

The main three image formats are:

  • PBM: Portable BitMap (black/white)
  • PGM: Portable GrayMap (grayscale)
  • PPM: Portable PixMap (color)

PNM (Portable aNyMap) is not a format itself, but is used to refer to all of the above formats3.

Originally created to send images when emails only supported 7-bit ascii text, each format can be written in plaintext. This is the contents of a 16x4 grayscale image (in PGM format) that steps horizontally from black to white, created with pgmramp -lr 16 4 | pnmnoraw > ramp.pgm4:

P2
16 4
255
0 17 34 51 68 85 102 119 136 153 170 187 204 221 238 255
0 17 34 51 68 85 102 119 136 153 170 187 204 221 238 255
0 17 34 51 68 85 102 119 136 153 170 187 204 221 238 255
0 17 34 51 68 85 102 119 136 153 170 187 204 221 238 255

You can open ramp.pgm in any text editor: vim, Notepad, ed, etc.

Going through each line of the file:

  1. P2 is the “magic number” identifying the PGM “plain” format.
  2. 16 4 is the width and height of the image in pixels.
  3. 255 is the maximum value (white) for any pixel, 0 (black) is always the minimum.
  4. The rest of the file is the pixel values from left-to-right and top-to-bottom.

The number of spaces and newlines is not important, as long as each value in the file is separated by whitespace.

Converted to the PNG format with pgmramp -lr 16 4 | pnmtopng > ramp.png, it looks like this (scaled up):

The PPM format is similar, but with 3 values (RGB) for each pixel.

The Netpbm wikipedia page has a nice overview of the formats, and the man pages for the formats are thorough: ppm(5) (web version).

The worst things about the format are:

  • No standard gamma and color space metadata. Some tools expect/create images with linear values, others the “CIE Rec. 709” gamma and colorspace mentioned in the specs, and others sRGB. While ambiguous gamma/colorspace handling isn’t unique to Netpbm, it would be nice to have a way to specify it.
  • Having to remember what the various converters are called (is it ppmtopng or pnm2png? Neither, pnmtopng supports pbm, pgm, and ppm formats)
  • The various formats and bit-depths make it simple to write but more complicated to parse5.
  • The large uncompressed file size.

If you want to create images or videos and your programming language doesn’t have an image library or you don’t want to use it for a simple debugging feature, writing in a PNM format is quick to implement. Instead of writing code to handle multiple image formats, tweak contrast, resize, or write frames to different files, you can use pnmto*, pnmnorm, pnmscale, or pnmsplit as well as ImageMagick and FFmpeg.

Writing PPM Images

For the Advent of Code solutions I have a short function in OCaml to write from an iterator of pixels (an 8-bit RGB pixel wrapper type is omitted):

(** Write a 24-bit RGB "raw" PPM image to [ch].
    [pixels] must be in left-to-right, top-to-bottom order.
    Exactly [width] * [height] pixels will be written: [pixels] will be
    truncated or extended with black as necessary.
    Each RGB value will be output modulo 256. *)
let ppm_raw_of_seq ch ~width ~height pixels =
  let fill = Seq.repeat (0, 0, 0)
  and output_pixel (r, g, b) =
    let ob = output_byte ch in
    ob r; ob g; ob b
  in
  Printf.fprintf ch "P6\n%d %d\n255\n" width height ;
  Seq.append pixels fill
  |> Seq.take (width * height)
  |> Seq.iter output_pixel

Then with a grid data type you can write a brief wrapper like so:

let to_ppm ch { width; height; cells; } =
  let to_pixel = function
    | Open -> (0, 0, 0)       (* black *)
    | Trees -> (0, 255, 0)    (* green *)
    | Lumber -> (165, 42, 42) (* brown *)
  in
  Array.to_seq cells
  |> Seq.map to_pixel
  |> ppm_raw_of_seq ch ~width ~height

After computing each new iteration, use to_ppm stdout grid and pipe the script output to FFmpeg.

FFmpeg

FFmpeg is a great open-source tool to convert and manipulate video and audio files.

Unfortunately, audio and video formats is complicated and the FFmpeg CLI reflects that. While you can make a video with no other command line flags, there are some important things to consider.

If each pixel in every frame matters (like when you’re debugging some AoC behavior), you should take extra care to use lossless compression and preserve all frames:

  • -crf 0 or -lossless 1: use lossless compression, preserving each frame perfectly
    • the lossless variant of mp4/H.264 is not as widely supported, and specifying it will override other codec options like -vcodec libx264 -pix_fmt yuv420p
  • -r 5 or -framerate 5 (image2-specific): sets the framerate without removing frames, must be placed before the -i flag to apply it to the input not output (FFmpeg calls this an “input option”)
    • image2 defaults to 25 FPS
    • -fpsmax 5 or -filter:v 'setpts=PTS*12.5' can drop frames when speeding up the video

Another thing to watch out for with pixel-perfect frames is that your video player will probably blur them when scaling the video to fit your screen in a process called interpolation. You can try setting the zoom to “100%” or looking for a “nearest neighbor”/“aliasing” setting (with mpv you can use --scale=nearest). Unfortunately there’s no nice way to force nearest-neighbor interpolation with web videos.

If your player can’t scale nicely, you can scale the video file itself with:

  • -vf 'scale=iw*4:ih*4:flags=neighbor': scale video by 4x without blurring

Then there are some niceties if you’re remaking the same video over and over again:

  • -y: overwrite the output file without manual confirmation
  • -hide_banner: removes the extensive version and build info printed by default

Usage

To make the first video above I used this:

ffmpeg -r 12.5 -i - -crf 0 -vf 'scale=iw*4:ih*4:flags=neighbor' /tmp/2018day18input.webm

If generating the frames takes a long time, you can pipe the stream to your favorite compressor:

./slow | zstd > /tmp/example.ppms.zst

and then reuse the compressed file to play with the ffmpeg flags:

unzstd < /tmp/example.ppms.zst | ffmpeg -i - -crf 0 -y /tmp/example.webm

While the uncompressed ppm streams can quickly blow up in size, file compression and video formats (even lossless) can bring them down to reasonable sizes again:

$ ls -sh /tmp/example.*
3.9M example.ppms
244K example.ppms.zst
 84K example.webm

Making videos that work well across devices is a whole other can of worms that I can’t offer much advice for. As far as lossless video formats, webm seems to have better cross-platform support than mp4/H.264.

I got this error in Firefox when trying to play lossless H.264 on my laptop:

Media resource file:///tmp/input.mp4 could not be decoded, error: Error Code: NS_ERROR_DOM_MEDIA_FATAL_ERR (0x806e0005) Details: auto mozilla::SupportChecker::AddMediaFormatChecker(const TrackInfo &)::(anonymous class)::operator()() const: Decoder may not have the capability to handle the requested video format with YUV444 chroma subsampling.

Unfortunately, GitHub doesn’t support webm videos embedded in markdown, but you can just add the .mp4 file extension that GitHub accepts and the web browser will figure out the right format…


  1. See also Protohackers, Advent(2), Hanukkah of Data, and The Command Line Murders. ↩︎

  2. Fun fact: there are only three named CSS colors with a 24-bit binary representation that is printable ascii (byte values 32-120 in decimal): darkolivegreen (“Uk/”), dimgray (“iii”), and darkslategray (“/00”). ↩︎

  3. There’s also PAM (Portable Arbitrary Map), which is a related format that’s flexible enough to support all of the other formats’ color types as well as transparency. It is a little more complex so I won’t talk about it here. Generally tools that support PAM also support all of the PNM formats. ↩︎

  4. Each PNM format has two ways of writing of pixel data, plaintext (“plain”) and binary (“raw”). Most of the Netpbm tools output the raw variant, and pnmnoraw converts to “plain”. For PPM, a single pixel of the color dimgray in the “plain” format can take up 12 bytes/characters (in ascii “105 105 105 ”), but in the “raw” format only 3 bytes (in ascii “iii2). Not all bytes are valid plaintext, so the “raw” format generally can’t be edited in a text editor or sent in email messages. ↩︎

  5. The farbfeld format takes a different approach with a single binary representation (16-bit RGBA, sRGB colorspace), but FFmpeg doesn’t support it (yet). ↩︎