Sunday, July 22, 2012

Normally the “[]” operator is used for extracting an element out of a sequence. Sequence[Index] turns into *(Sequence+Index) in C. And since + is commutative, it’s also Index[Sequence]. Yes, you can write 3[a] and it’s the same as a[3]. Is that useful? Probably not. But it’s interesting.

I want to talk about the “.” operator. It’s taken for granted, like “[]”, but I think it’s interesting to question how it works.

Normally the “.” operator is used for extracting an element out of a record. In Python it’s two hash table lookups. Given a.x you first look up a in the current namespace (stack frame) to find a record, then look up x in the record. In many languages one or both of these steps are optimized, but I’m going to explore the straightforward unoptimized implementation.

Field lookup

Let’s consider field lookups a.x, a.y, b.n, and b.y. (Note: for simplicity I’m conflating variable names and record ids in the diagrams and sample code but they’re really separate things.) Here’s the conventional view:

SVG diagrams may not show up in feed readers - view original.varsStack FrameName“a”“b”Recordrec1Record “a”Field“x”“y”Value3.142.71vars:rec1->rec1:rootrec2Record “b”Field“n”“y”Value“no”“yes”vars:rec2->rec2:root

In code, we can think of it like this (Python syntax):

values = {}
values["a"] = {}
values["b"] = {}
values["a"]["x"] = 3.14
values["a"]["y"] = 2.71
values["b"]["n"] = "no"
values["b"]["y"] = "yes"

We think of the record as primary and the field as secondary. But does it have to be this way? One of the reasons this layout is preferred is optimization. In a language where record types are known, mapping from field name to offset in the record can be resolved at compile time (like this). This means we can use a simple dereference instead of a hash table lookup. For this blog post however I’m going to ignore optimizability and instead look at the alternatives.

Combined lookup

What if we put both the record id and the field name into a tuple, and made them siblings in the lookup process?

GvarsStack FrameName“a”“a”“b”“b”Field“x”“y”“n”“y”Value3.142.71“no”“yes”

In code, we can think of it like this:

values = {}
values[("a", "x")] = 3.14
values[("a", "y")] = 2.71
values[("b", "n")] = "no"
values[("b", "y")] = "yes"

Are there any downsides to this, other than optimizability? It’s a little more work to enumerate the fields of a record but you can mostly do the same sorts of things with this implementation as you can with the conventional implementation — lookup, assignment, field insertion, field deletion. This approach may be useful in some situations but I didn’t pursue it.

Record lookup

What if we made the field name primary and the record id secondary? Back in the 1990s when I was into programming languages (I’ve since recovered), I played around with this idea.

GvarsStack FrameName“x”“y”“n”FieldxField “x”Record“a”Value3.14vars:x->x:rootnField “n”Record“b”Value“no”vars:n->n:rootyField “y”Record“a”“b”Value2.71“yes”vars:y->y:root

In code, we can think of it like this:

values = {}
values["x"] = {}
values["y"] = {}
values["n"] = {}
values["x"]["a"] = 3.14
values["y"]["a"] = 2.71
values["n"]["b"] = "no"
values["y"]["b"] = "yes"

One interesting idea that came out of this experimental language was the idea that the field name could be local to a module or function. (See also: symbols in Lisp packages.) With this feature, you can attach fields to records without needing subclassing. (Ruby’s mixins are some like this except they only allow methods, not data.) Records become only IDs; all the actual data is attached from the outside. (See also: database normalization.)

Imagine being able to attach per-module fields to objects. Imagine being able to treat a “field” just as you might treat an “object” in a regular language: take fields as function parameters, return them from factory functions, import/export them with modules, store them inside objects, etc.!

Use in games

Is this useful in games? I think so. I think it might be related to what entity/component systems are trying to build without having direct language support. The entities are the records and the components are the modules and local data. Consider this data:

Gbig“x”“y”“health”“color”soldier131273850soldier249671891beacon15510“red”beacon21284“green”

Object-oriented systems organize this by column. There’ll be a soldier module that defines a class with x, y, and health, and a beacon module that defines a class with x, y, and color. There will probably be a game object superclass that contains x, y.

When you treat fields as first class constructs in the language, you organize this differently — by row. You’d put the x and y fields into one module, health into another, and color into a third. Code that works with soldiers would import the position and health modules. Code that works with beacons would import the position and color modules. And code that works on either would only import the position module. You can mix and match features — for example, an object with both health and color takes no additional classes or code in this system, because health and color are in separate modules. You keep traditional syntax like soldier2.health; it’s just that health might be a local field instead of a global one, visible only in modules that have imported the health field. You can attach local fields to records at run time and detach them when no longer needed.

It’s quite possible this system wouldn’t work out in practice. I never finished my programming language so I can’t point to a concrete test of the principles described here. The ideas from the experiment have definitely influenced how I write code; that’s the outcome I hope for when exploring an idea. It’s fun to take something mundane like “.” and ponder how else it could work.

Labels: ,

3 comments:

zimzat.com wrote at July 22, 2012 11:35 PM

While I'm not entirely sure how you got from data storage methods to class structure layouts, what you're describing at the very end sounds a lot like component-based designs. Instead of having an object be a concrete structure (and potentially duplication of code), instead objects are simply groups of components based on what behaviors you need them to respond to or accept. This in turn allows objects to be dynamically created based on blueprints of components and values within those components.

I've done something similar to this in my business-logic applications as well, when behavior is similar but specific rule sets needed to be built up based on various configurations.

I was trying to find where I originally read about this, including code examples, but all I was able to turn up was Cowboy Programming: Evolve Your Heirarchy, although it does include some nice illustrations (just no code).

Jotaf wrote at July 23, 2012 9:19 AM

Very interesting! I agree that this points in the direction of treating properties like first-class citizens, which is kinda the goal of component-based design.

The starting point, however, is what most caught my attention in this context (reversing the dot operator). Attaching meta-properties to properties is a nice prospect, but it could still be subsumed by the component architecture, which additionally allows you to group similar properties (x,y for example). It's a new perspective on components though, and makes them feel a bit more theoretically justified IMO!

Doug Orleans wrote at July 26, 2012 8:51 AM

I think I saw this idea first in BeCecil (a formalized version of Cecil)-- specifically, the idea of block-scoped fields. I did something similar in Socrates (the language I wrote for my PhD): a field is a hash-table, and objects are the keys. (Actually lists of objects, so you could have multi-fields a la multi-methods.) I don't know how practical it would be, but it seemed like a fun idea at the time...