Pod Somm: Tracking Specialty Coffee with a Bookmarklet and a Lot of DOM Wrangling

I have an xbloom studio, a pod-based specialty coffee maker that grinds and brews individual capsules. The coffee is genuinely good, with rotating single origins, light roasts, and careful sourcing, but the catalog moves fast. I’d find a Burundian natural I liked and lose track of it by the time it cycled back in stock. I wanted a place to keep notes: what I bought, what it tasted like, what I paid per cup.

So I built Pod Somm.

Pod Somm is a small personal app: a Node.js/Express API, a flat JSON file as a database, a vanilla JS frontend deployed on Fly.io. The part that required actual design work was getting coffees into it.

The Scraper That Couldn’t

The obvious approach was a server-side scraper. xbloom runs on Shopify, which exposes a product JSON API at /products/slug.json. Send a request, parse the response, done. It worked perfectly from my laptop.

It failed completely on Fly.io.

xbloom’s CDN is behind Cloudflare, which blocks requests from cloud datacenter IPs. My Fly.io machine lives at a datacenter IP. The product JSON API returned a 404; fetching the HTML returned a Cloudflare challenge page. I confirmed this by running the same request from my home network (fine) and from the deployed app (blocked). There’s no User-Agent trick that fixes this: Cloudflare isn’t fooled by a Mac Chrome header on a request originating from an AWS region.

The server-side approach was dead.

The Bookmarklet

Moving the scraping into the browser sidesteps all of this. A bookmarklet is a javascript: URL saved as a browser bookmark. When you click it on an xbloom product page, it runs in that page’s context: same origin, real browser session, no Cloudflare friction. It can call /products/slug.json directly, parse the response, and pass the data somewhere useful.

The data travels via URL hash. The bookmarklet builds a URLSearchParams string from the product data and opens podsomm.co/#add?name=...&roaster=...&price=... and so on. Pod Somm detects the hash on load, strips it with history.replaceState, opens the add modal, and pre-fills the form.

Generating the bookmarklet cleanly required one trick: the javascript: URL has to be a self-contained string with no external references. I defined the scraping function as a named function inside openBookmarkletModal(), then serialized it with .toString(), replacing a __TRACKER__ placeholder with window.location.origin before wrapping it in javascript:try{(function...){}}catch(e){alert(e)}. This means the bookmarklet always points to the right host whether the app is running locally or on Fly.io, and errors surface visibly instead of silently.

There was one early failure: // inline comments inside the serialized function body. When a browser collapses the newlines (which bookmark storage sometimes does), a // comment eats everything that follows it on the now-collapsed line. Removing all comments from the bookmarklet function fixed it.

The Metafield Problem

Shopify’s product JSON API gives you title, vendor, price, images, tags, and body_html. For most xbloom coffees, tags include the roast level and process (Light Roast, Washed). That’s enough for the important fields.

What it doesn’t give you is origin, variety, or altitude. Those are Shopify metafields, structured product attributes stored separately, rendered by the theme. They don’t appear in the JSON API response. They’re somewhere in the page’s rendered HTML, but how they’re rendered depends entirely on the theme, and xbloom’s theme renders them without visible text labels. The DOM has the value (Burundi) but not the label (Origin), so strategies like “find dt elements” or “find elements with class label” return nothing.

The working approach uses tasting notes as an anchor. Tasting notes reliably appear as a dot-separated string (Apricot · Cascara · Autumn Honey) that’s easy to find in the page’s leaf text nodes. Once I have that position in the leaf array, I scan a window of nodes around it for text that exactly matches a country name from a whitelist. Origin tends to appear near tasting notes in the product spec layout.

This required scoping the leaf scan to document.querySelector('main') rather than document.body. The full body includes a cart drawer and pre-rendered recommendation cards, which contain product data for other coffees. Scanning the whole body would find Burundi from a recommendation card before finding it from the current product. Scoping to <main> excludes off-canvas Shopify drawers without needing to know their specific class names.

Description had a similar problem: body_html from Shopify sometimes contains HTML comments left over from Word-pasted content (). The expected fix, body_html.replace(/<[^>]+>/g, ' '), strips tags but not comments, and textContent on a DOM element includes Comment node data. The fix is to strip comments explicitly with //g before setting innerHTML.

Cost Per Cup

One of the reasons I wanted a tracker was to compare value across coffees. A $38 bag sounds expensive; a $38 bag with 14 pods is $2.71 a cup, which is reasonable for a well-sourced single origin.

The form has two mutually exclusive fields: Pods (for xbloom capsules, where one pod is one cup) and Bag weight (g) (for ground coffee, where I assume 15 g per cup). Filling either one clears the other. Cost per cup is computed from whichever is set and displayed next to the price on both the card and detail views.

One bug cost me time: the field values were being silently dropped by a field allowlist in the database layer that wasn’t updated when the form was. A payload arriving at the API endpoint with pods: 14 was saved as pods: null without any error. One-line fix, but silent data loss is the worst kind to track down.

Where It Is Now

The app holds a modest catalog of coffees I’ve bought or tried. Each entry has specs, tasting notes, brew notes with ratings, and a link back to xbloom. The bookmarklet covers most of the data entry: I click it on an xbloom product page and the form arrives mostly filled in, with origin and variety as the fields most likely to need manual correction.

No framework, no build step, no database beyond a JSON file on a persistent Fly.io volume. For something I’m the only user of, that’s plenty. All the design work ended up in the data ingestion layer, a bookmarklet running on xbloom.com, not anything on my server, which is not where I expected to spend time when I started.

The Scraper That Couldn’t#

The Bookmarklet#

The Metafield Problem#

Cost Per Cup#

Where It Is Now#

The Scraper That Couldn’t

The Bookmarklet

The Metafield Problem

Cost Per Cup

Where It Is Now