In the last post I described how I structure my interactive diagrams and gave some examples. In this post I'll describe what I'd like to do for future diagrams.

Flow diagram: controls → input → algorithm → output → visualization

When I started writing interactive diagrams I used d3.js, which is a great visualization toolkit. At the time I thought data visualization was the hardest part of what I was doing, and I needed a library for that. Since then, I've seen other libraries — React, Riot, Ember, Mithril, Intercooler, Marko, Vue, Aurelia, Ractive, Rax, Svelte, MobX, Moon, Dio, Etch, Hyperapp, S/Surplus, Preact, Polymer, lit-html, Elm, and many more — that have given me different ways to look at the problem. I've realized that the visualization isn't the hardest part of what I'm trying to do. There are two two big problems I want a library to help me with:

  • How do I update the visualization?
  • Which algorithms do I need to run again?

For deciding how to update the visualization, there's a spectrum:

  1. The simplest thing is to redraw everything from scratch. I use d3 or jquery to remove all the existing SVG nodes, and create new ones in their place. If I'm using Canvas instead of SVG, it's more likely that I'll redraw everything from scratch.
  2. If I don't want to redraw everything (for performance reasons), I can use d3.js's data joins to help me figure out which parts of the visualization to add, remove, and update. This is great for many data visualization needs, but a bit overkill for most of my projects.
  3. The next simplest thing is to pretend I'm redrawing everything from scratch, but let the system figure out what actually changed, and only update those things. This is the “virtual DOM” approach used in React and other libraries. This approach wasn't available when I first started writing interactive tutorials, and I'm only now catching up.

I've been using approach 2 but will be trying approach 3 for future projects. It's simpler to describe what I want than how I want to get there. Also, since I am writing documents instead of web applications, I prefer to describe the output in document form (html/svg) instead of code form (javascript). However, this is just a preference, and you should follow your own preference.

For deciding which algorithms to run again, including calculating the visualization, there's also a spectrum:

  1. The simplest thing is to run everything. Easy!
  2. If I don't want to run everything (for performance reasons), I can manually decide what to run based on which input control was changed. This is what I do for most of my projects, but it's error prone. It was especially bad for the hexagon page.
  3. I can have the algorithms mark which inputs they depend on, and then when those inputs change, the algorithms automatically run again. I did this somewhat for the A* page, which re-runs graph search when the graph data changes.
  4. I can ask the system to track the data dependencies: which controls each input depends on, which inputs each algorithm depends on, and which algorithms each visualization depends on. Then it can automatically update the right things when a control is changed.

I've recently experienced approach 4 with ObservableHQ (see my notebooks), which tracks changes between cells and automatically runs algorithms and visualizations that depend on them. It's quite refreshing! I tried Vue for this unfinished page, and it was a great fit for my needs. I've not yet tried Elm, which seems like the coolest approach for dependency tracking and automatic updates.

I've mostly been using approach 2 or 3 but will be trying approach 1 or 4 for future projects. Tracking dependencies manually, even if using the Observer pattern, is error prone. In approach 2 the controls have to know which inputs are affected, the inputs have to know which algorithms are affected, and the algorithms have to know which visualizations are affected. In approach 3 the visualizations have to declare which algorithms they care about, the algorithms have to declare which inputs they care about, and the inputs have to declare which controls they care about. Approach 3 is better than 2, but both mean I'm constructing the dependency chain. I'd rather not have to do that. So my options are either not to worry about it, or to use something that automatically handles dependency tracking.

Every once in a while to I like to pause to examine my work and try to figure out what's time consuming or error prone. I'm hoping switching to a more functional+declarative style will allow me to make interactive tutorials more quickly in the future.

Labels: ,

2 comments:

Danston wrote at March 17, 2018 1:01 PM

I was reading this eagerly waiting for you to bring up observable since it's a tool that seems to dovetail with your work nicely. I'm glad you found it.

Something about the software design patterns you described reminds me metaphorically of this https://simblob.blogspot.com/2010/06/teleological-vs-ontogenetic-map.html

Either the caller has to know what the callee cares about or vice versa. Just gave me a feeling of "top down" vs. "bottom up," especially since the "data down actions up" pattern is becoming so popular. Sometimes I visualize the data flowing through the algorithm to the visualization like rain water flowing downhill to the sea. The metaphor falls apart a bit because erosion would probably be a side effect in computer science terms.

Anyway, thanks for the informative post as always.

Amit wrote at March 18, 2018 9:25 AM

Thanks Danston — that's a interesting connection I hadn't made before. This makes me think there are probably other places I should be looking for a push pattern that could be reversed into a pull pattern, or vice versa. I always learn a lot by looking at a problem in several different ways.

I like the image in my head idea of data flowing through an algorithm. In one notebook (https://beta.observablehq.com/@redblobgames/reshaping-distributions) I have data flowing through a sort algorithm, animating from the input to the output.