Monday, December 26, 2011

Focus considered harmful

Most of us are used to the concept of focus in modern desktop UIs - there's usually one app that has it. But what is it? Essentially the app that has the magical focus is the target of all indirect input.

The way I see it, there are two classes of input devices in a common desktop system: direct and indirect. Direct input devices can readily and immediately target any app; while indirect ones cannot. Mice are direct input, keyboards are not.

The key problem with input is that you need to decide who gets it. This was not a problem with the OSs of yore where there was just one running app at all times, nor is it a problem with the new tablets because they have only one "full screen" app running at all times. Desktops with their multitasking, non-full screen apps however, have the problem in spades. Now that I use the Mac, I'm ticked off that cmd-tab doesn't necessarily mean you'll get the app front-n-center. You see, if you'd minimized the app before, you have to add opt to the cmd-tab just before the app you want so that it will pull up the minimized window as well. Otherwise its just the menu on the top for you. Why? I'm sure there's a really super important Mac/Apple reason for this behavior but it pisses the heck out of me.

Enter the newbie, exemplified by, in this case, my parents. My parents cannot figure focus out. They will painstakingly fire up an app, and start typing in the hope that Windows will magically figure out that where they're looking at (or were looking at before looking down at the keyboard) is where they want the keystrokes to go to. Of course, this is without having brought the cursor to the text field in the first place so all that input does is bupkis (Where do keystrokes go when typed without focus, I wonder). And if they'd actually remembered to "make sure the blinking line is at the user name field", McAfee decides to popup a message that silently steals that focus away.

Focus is an implementation artifact and a leaky abstraction at that. The currently available approaches are:

  1. Its a non-issue: This is what tablets do - you dont need to worry about who gets the input if its completely clear that there's always only one such target.
  2. No more indirect input: Again, this is what touch devices do; you don't need to pick a target when the act of providing input actually selects the target.
  3. Provide a switcher: This is the Alt/Cmd-tab solution. Provide a way to switch the target of the indirect input. However, as we've seen above, this has problems.
Would it help to be even more explicit about who gets the keystrokes than just highlighting the app's title bar? Would it help to avoid modal (and non-modal, but focus-stealing) dialogs?

Is there a better way that tells the user that any indirect input will be sent to a particular app? Or, alternatively, is there a seamless way of "figuring out" the intended target? I can think of two promising avenues:
  1. Computer vision is slowly getting better to the point where head tracking using web cams is actually possible. Can this reach the resolution required to figure out the app the user is actually looking at? Lots of false positives in the future down this path, but definitely its a head-on approach at the problem.
  2.  No more leaks in the abstraction: This is a cheaper solution to computer vision and essentially involves ensuring that:
    • There're separate "user controlled space" and "system use" space and never the twain shall overlap.
    • User controlled space will always have one app that has focus. It will lose focus only when the user wills it to. The action to switch from one app to another is atomic and results in the other app getting focus. This will be true even on tiling window managers.
      • Dialogs and other such UI elements that halt the app before user interaction will work as before.
    • In addition to serially switch the target of input using Cmd/Alt-tab there will be a key sequence to switch between apps - maybe like the Ctrl-Shift-F1-7 sequence that Linuxes have for the default ttys.
    • System use space will be read only, except for some suitable way to scroll through messages.
Todo: figure out how to deal with system messages that require immediate attention.

Functional vs Implementation Models

Our apartment building has a down ramp from the elevator to the basement level. My kids usually run down this ramp, stopping just short of the driveway for cars exiting the basement.

Of course, they stop short because they've been told that "Cars are coming"; or as my under-2-yr-old says "because car comin".

The other day he ran down the ramp as usual, stopped short of the driveway and chanted his usual "because car comin"; then paused to look, said "Car no comin" and proceeded to walk across the driveway.

I'd almost started to blurt out the "Look to your left, look to your right, then look to your left again.." routine in an effort to instill traffic safety at a early age (I know, parents) when he said "Car no comin". Since he did that, I kept quiet.

It then struck me that his way was much better than the "Look to left routine" because he's formed a mental model of when cars come and when they don't. Once he has that model, he's free to decide what to do: cross the driveway, stay put, or whatever.

The problem with the routine is that its prescriptive; and therefore - by definition - too restrictive. How would a child's mind associate such a routine to the situation? It seems to me there are quite a bit of wrong associations that can be made to the one right one:

Right:

  • When crossing a driveway, do the routine for your own safety.

Wrong:

  • When you run down a ramp, at the end do the routine
  • When you stop, do the routine
  • When daddy shouts as I'm running down a ramp, stop and do the routine
More importantly, the value of looking left again will be completely lost on a child IMO. Wouldn't it be much better if he arrived at that step by himself?

Thus too, it seems, with software. As long as we build in the right functional models, it should be easy to instill the right implementation model; and by extension - modify or maintain it.

Wednesday, December 14, 2011

OS.next

  • Hardware Compatibility:
    • Run on x86, AMD
  • Software Compatibility:
    • Support Win PE, Unix executables and Mac executables natively.
  • Boot loader:
    • Support booting from usb, sd card
  • OS:
    • Ability to have the whole OS on portable media (eg SD card) if so required
    • File system:
      • Tagged storage instead of directories
      • Sensible file system hierarchy that combines win/unix/mac concepts
      • FS shadowing for legacy executables so that they think they're still "at home"
      • Ability to move /home to an sd card if so required. More generically, ability to configure OS to have any "standard, should be on boot volume" directory elsewhere.
    • Memory:
      • Nothing special as of now.
    • Disk:
      • versioning of files built in. ie some kind of journaling file system.
      • Security built in - encrypt a file, directory, mount point or whole system.
    • UI:
      • Light, capability-based UI framework.
      • Support for UI paradigms such as radial menus - controls that use Fitts Law better, basically
      • "Back to the place I was before" mode for people not ready to make the plunge
      • Keyboard support for everything. EVERYTHING!
    • Software in general:
      • Solve the package manager/installer problem for good
      • Solve the registry vs 1000s of config files problem for good
      • Rolling versions of the OS, easy revert back to previous version
      • Tool to see files changed due to any install/ version upgrade built into the OS
    • Shell:
      • normalization of options and formats across at least the busybox commands
      • autocomplete for command options got from info/help files
      • oo-ize commands (ie dir.ls instead of cd dir;ls)
      • Structured editing of the command line a la zsh
Updates to this post as I think em up :)

Update #1:

Show a schematic box diagram of the computer system as it boots up and have icons (or text) depicting the status of each box in the diagram.

This is in lieu of the "screens of flying text" a la linux or the "startup progress bar that never quite ends" of windows/osx

Also missed out adding my idea about intelligent workspaces, but it should be in this list.

Tuesday, December 13, 2011

Codermetrics: A review

I picked a book called Codermetrics(book, website) yesterday - one of those impulse purchases because I liked what it said on the cover. I've not finished reading it completely, but I went through most of the meat of the book and its an interesting concept.

The core concept of the book is simple: Measure a developer - and by extension the development team and organization - in the same way the world measures sports persons, teams and organizations. The book's name is a play on Sabermetrics - which apparently is all the rage in Baseball these days as far as metrics go.

After a somewhat drawn-out beginning, the bulk of the book is focused on defining metrics for developers, teams/organizations and their combination. The developer metrics measure how good a developer (called coder throughout the book) is at advancing the team towards the org's goals (Offensive Impact); how good s/he is at preventing or avoiding the things that detract the team from the org's goals (Defensive Impact) and so forth. Specific attention is paid to work done beyond the call of duty (Pluses) and help given to others (Assists).

Next, the organization is defined in terms its successes (Wins), failures (Losses) and position relative to competitors. In each case, the end of the chapter gave some examples or archetypes that arise from specific combinations of the relative values that the metrics may take, for eg, what separates a junior coder from an architect, or what separates an enterprise company from a startup.

The final set of metrics - called Value metrics - were the most interesting. While the previous sets of metrics required some input data from the real world, these ones were derived from other metrics and were intent on exploring the contributions of individuals factored into the larger scheme of things in the organization. Terms like influence that a particular developer has on the team and company, the teamwork  demonstrated etc become measurable things!

The final section of the book has some advise on how to go about implementing such a process in an organization.

Overall, the concept of the book and the simple language used to explain the concept is a win for the book; as is the author's repeated disclaimers that we should heed to his overall construct, not the specific metrics that he provides. This is sane advice and a welcome change from the typical prescriptive nature of texts on metrics.

I've started trying this out on one of my teams which is particularly feeling the pain of under-delivery. Highly recommended.

Sunday, December 11, 2011

Information Density and textual vs visual

I was reading the Wikipedia page on Information Theory as part of my read the wiki project, when it struck me that there's possibly an objective way of measuring the effectiveness of text vs visual programming languages using Information Theory.

The central concepts (whose math I'm admittedly unable to fathom) are those of information, information rate, entropy and SNR. One of the age-old cases for text-based programming ( and therefore against non-textual programming languages) has been that it has very low SNR and the "information density" is high for given screen real estate.

Is that really true, though? How much "noise" does syntax add? On the other side of the spectrum, I've seen infographics that assuredly deliver more "understanding" in the given screen space than the equivalent textual description. Is it possible to design an "infographic-style" programming language that packs more power per square inch than ascii?

It would be interesting to do some analysis on this area. 

Descriptive Drawing Language

I have been thinking of the way by which UI mockups are created. In most Visio-style applications - which are the mainstay for this kind of thing - there's usually a pallette of known "widgets" that can be dragged onto a canvas and put into place using the Painter's Algorithm. The good part about such an arrangement is that they allow easy creation of UI mockups, and thereby expression of the look of the app-to-be.

This works when the UI framework is an established one and the set of widgets is known. But what if you're trying to create a new UI framework and you want it to be completely (or sufficiently) different from the current pack?

The other problem I see with the current way of doing things is that UI mockups/wireframes are still woefully disconnected from the rest of the development process: the designers build their mockups, pass it onto the graphic artists who finish up the design with dimensions, colors etc and finally deliver a finished design (typically in html for webapps, most probably psd for others) to the developer - who now has to tear down the carefully crafted design to get at its constituent pieces.

I was thinking of an alternative that's inspired in part by Graphviz.

Imagine being able to draw a picture by describing what it contained. So a description of a wheel would be something like:

wheel:
    circleA: circle
    circleB: circle
    line1, line2, line3, line4: line

    circleA contains circleB
    line1: pointAt(circleA, top) - pointAt(circleB,top)
    line2: pointAt(circleA, right) - pointAt(circleB,right)
    line3: pointAt(circleA, bottom) - pointAt(circleB,bottom)
    line4: pointAt(circleA, left) - pointAt(circleB,left)

The syntax is obviously very handwavy at this point, but hopefully the intent is clear:

  • You describe the scene, not paint it
  • You define relationships between the objects not place them in specific positions. 
  • You provide dimensions only if absolutely required, else the system figures it out for you.This is obviously not intended for a "to scale" diagram; although I can already imagine a Graphviz style mode that outputs the same format as the input - only annotated with dimension information that can further be processed to make it to scale.
The advantages I see with such an approach are:

  • Diagramming becomes descriptive. It could easily have a Canvas backend as a GL one, but the scene would be the same
  • UI wireframes can be built "from scratch" easier than before and thus support early exploration of new ways of expressing design intent
  •  UI wireframes become instantly accessible to version control
  • Developers can literally write diagrams!
Design notes:

  • There will obviously be primitives - line, circle, arc etc come to mind.
  • The system will allow creation of compound objects from simpler ones and primitives
  • Spatial organization in a 2D plane will be represented by a logical grid, or using predicates such as "contains", "to the left of", "to the right of" etc
  • Overlapping objects will be represented by a Z-ordering (as in most such tools) using predicates such as "in front of" "behind". Alternatively, the system could allow depiction of each "layer" in the Z-axis as a slice within which no front/behind predicates are allowed
  • Objects can have "anchor points" defined which allow other objects to use those specific points in other areas of the script.

Sunday, December 04, 2011

My Bowerick Wowbanger homage... aka the wikipedia breadth-first read

I've been meaning to start this for quite a while now.

If you've read the Hitchhiker's guide to the Galaxy, you'll probably remember Wowbanger the Infintely prolonged who takes upon himself to insult every living being in alphabetical order just to keep himself busy?

Well, aside from the immortality, the need to keep myself busy and the insulting bits, this is my Wowbanger-esque project.

I've started reading the Wikipedia at the Computer Science page, and I aim to keep reading it in breadth-first fashion from that page onwards. I hope to read one page every day. Along the way, I'll track my course with a graphviz graph that will eventually be a map of the wikipedia subtree (subgraph?) that is rooted at "Computer Science". Simple rule for picking the next page to read: I'll pick the one that I know the least about.

I've also created a wikipedia account, should it come of use somehow. I'll look to see if I can fix simple typos and such - as they arise; and if there indeed are pages that are stubs that I can fill out, I'll try.

Why
Because one my (admittedly naive) goals in life when I was still studying was to "know everything there is to know about computers". A decade and a half later I still remember it, so it must mean something. I figure this is the least I can do :)

PS: Yes, I did think about what I'd do if a page I'd read already changes. I dunno. We'll see how it goes. Even Wowbanger did Arthur twice :)

Thursday, December 01, 2011

I visited JavaFXville and all I got was this post

I tried to write an FLV player in JavaFX today.

I tried very hard.

I didnt get very far. Or when I did, it was not obvious how far I actually had.

It all started with me looking for a trustworthy way of downloading Youtube videos - you known, the Randy Pausch kind. To share with my team easily.

Googling for Java and FLV showed that JavaFX spoke FLV natively. Awesome. I'd heard enough of JavaFX in its previous incarnation as F3 to think it was worth a shot for a weekend project. Not to mention this is the closest that one can get these days to simple declarative GUI.

Of course, I'm sadly backed up on the news front. Oracle, in its infinite sagacity, decided that the declarative DSL was to go; replaced by a Java API. So what looked like a simple sample in the 1.3 documentation morphed into this beast of boilerplate in 2.0.

But wait, there's FXML - the Oracle blessed replacement for FXScript, right? It IS XML, but its still declarative, right?

Wrong. Conjuring up the right incantations to get MediaView instantiate a Media object is obviously beyond me. I've never liked XML as a UI description language and this abomination only stokes that feeling.

So I give that up and get back to the POJO version; which works for the sample FLV they provide.

So I proceed to download a Youtube video using a Sourceforge-based downloader (also Java) and the bloody thing doesnt load. Maybe the downloader is broken?

I dunno. The code looks like its doing multiple passes on the embed element's params; and justifiably so because that value's a snakepit of multiple URL encode sessions. Security by obscurity, perhaps?

Ugh.

Its 4 in the AM, and all I have to show for my efforts is this blogpost.

Final nail in the coffin: I now read that JavaFX has spotty support for Linux. The whole intent of this was to create something in Java so that I can share it with my linux-using team.

FML.