The Tyranny of Metrics

One of the great themes of Aristotle’s work on ethics and politics is the need for human judgment. In the Politics, when he describes the virtues that the “master craftsman” (architekton) of the state must have, chief among them is practical wisdom (phronesis). In the Ethics he points out that no matter how carefully laws are written, they will always be incomplete by virtue of their generality — their relevance to a given case will always have to be determined by judges, judges who therefore need to possess the virtue of equity (epieikeia) in their decision-making: that is, the ability to decide, with tact and shrewdness, just how the law should be applied in a given case.

There are few things that the Modern Moral Order despises more than human judgment. One could argue that the chief energies of the MMO have been devoted to the elimination of such judgment, to render phronesis and epieikeia wholly unnecessary. What drives the MMO is what Taylor calls “code fetishism” or “normolatry.” In our time, one of the primary manifestations of code fetishism is, in the title of Jerry Z. Muller’s important new book, The Tyranny of Metrics. From the Introduction:

Schemes of measured performance are deceptively attractive because they often “prove” themselves by spotting the most egregious cases of error or neglect, but are then applied to all cases. Tools appropriate for discovering real misconduct become tools for measuring all performance. The initial findings of performance measurement may lead poor performers to improve, or to drop out of the market. But in many cases, the extension of standardized measurement may be of diminishing utility, or even counterproductive — sliding from sensible solutions to metric madness. Above all, measurement may become counterproductive when it tries to measure the unmeasurable and quantify the unquantifiable.

Concrete interests of power, money, and status are at stake. Metric fixation leads to a diversion of resources away from frontline producers toward managers, administrators, and those who gather and manipulate data.

When metrics are used by managers as a tool to control professionals, it often creates a tension between the managers who seek to measure and reward performance, and the ethos of the professionals (doctors, nurses, policemen, teachers, professors, etc.). The professional ethos is based on mastery of a body of specialized knowledge acquired through an extended process of education and training; autonomy and control over work; an identification with one’s professional group and a sense of responsibility toward colleagues; a high valuation of intrinsic rewards; and a commitment to the interests of clients above considerations of cost. 

It is noteworthy — and from where I sit very interesting — that Muller came to write this book because of his experience as the chair of an academic department. Much of a department chair’s job in the American academy today involves manipulating the metrics of assessing “learning outcomes” — as described in this essay by Molly Worthen. (There are advocates for more nuanced and humane models of assessment — Kate Drezek McConnell, for instance — but if you’re a professor and you get to deal with someone who thinks the way McConnell does, you’re very lucky.)

Of course, the reign of metrics extends far beyond the academy. Muller shows it at work in law enforcement — How many arrests is a police department making in relation to what the metrics say the number should be? Is the DA’s office meeting its expected conviction rate? — and in medicine — Hey surgeons, don’t take on difficult cases that might lower your success rate. And I vividly recall the moment several years ago when the gifted designer Douglas Bowman left Google because he wasn’t allowed to design, only oversee A/B testing.

Where does metrics succeed? Among other places, in sports. The analytics revolution has affected almost all sports, and has been wonderfully illuminating. Sometimes advanced analytics tells you that what you believed all along is indeed correct — there are no analytical models of basketball success that don’t put Michael Jordan at the top of the heap — and sometimes you discover that your observations of the game have led you to dramatically overrate some players and underrate others. (The latter discoveries are especially fun.) But all sports are, in one way or another, counting games: you count wins and losses, and count the actions that lead to wins and losses: made and missed shots, strikeouts, completed passes, unforced errors, and so on.

You can sort much of the rest of life that way if you want, I suppose. For instance, in evaluating the design of a website you can ignore such fuzzy notions as “beauty” and simply count the number of clicks associated with various shades of blue. (That’s why Bowman left Google.) You can “teach to the test,” ignoring every aspect of education except the ones that produce higher test scores — and if your job depends on your students’ test scores, teaching to the test is what you’d damn well better do.

And wherever it’s possible to make the metrics better, we should. Something that is not measurable now may become at least partially measurable in the future. The problem is not the use of metrics, it’s the tyranny of metrics. And perhaps the worst consequence of that tyranny is its tendency to make us give up altogether on the cultivation of judgment — of phronesis and epieikeia. Mistrusting judgment, believing that it can never be accurate, our technocracy figures that using whatever metrics we have — and torquing our questions and thoughts and concerns in the direction of existing techniques of measurement and assessment — is the best available option. The fear is that human judgment will never be anything more than emotionally-driven opinion. And you know what? Untrained judgment always will be emotionally-driven opinion. This is what we call self-fulfilling prophecy.

the end of algorithmic culture

The promise and peril of algorithmic culture is a rather a Theme here at Text Patterns Command Center, so let’s look at the review by Michael S. Evans of The Master Algorithm, by Pedro Domingos. Domingos tells us that as algorithmic decision-making extends itself further into our lives, we’re going to become healthier, happier, and richer. To which Evans:

The algorithmic future Domingos describes is already here. And frankly, that future is not going very well for most of us.

Take the economy, for example. If Domingos is right, then introducing machine learning into our economic lives should empower each of us to improve our economic standing. All we have to do is feed more data to the machines, and our best choices will be made available to us.

But this has already happened, and economic mobility is actually getting worse. How could this be? It turns out the institutions shaping our economic choices use machine learning to continue shaping our economic choices, but to their benefit, not ours. Giving them more and better data about us merely makes them faster and better at it.

There’s no question that the increasing power of algorithms will be better for the highly trained programmers who write the algorithms and the massive corporations who pay them to write the algorithms. But, Evans convincingly shows, that leaves all the rest of us on the outside of the big wonderful party, shivering with cold as we press our faces to the glass.

How the Great Algorithm really functions can be seen in another recent book review, Scott Alexander’s long reflection on Robin Hanson’s The Age of Em. Considering Hanson’s ideas in conjunction with those of Nick Land, Alexander writes, and hang on, this has to be a long one:

Imagine a company that manufactures batteries for electric cars…. The whole thing is there to eventually, somewhere down the line, let a suburban mom buy a car to take her kid to soccer practice. Like most companies the battery-making company is primarily a profit-making operation, but the profit-making-ness draws on a lot of not-purely-economic actors and their not-purely-economic subgoals.

Now imagine the company fires all its employees and replaces them with robots. It fires the inventor and replaces him with a genetic algorithm that optimizes battery design. It fires the CEO and replaces him with a superintelligent business-running algorithm. All of these are good decisions, from a profitability perspective. We can absolutely imagine a profit-driven shareholder-value-maximizing company doing all these things. But it reduces the company’s non-masturbatory participation in an economy that points outside itself, limits it to just a tenuous connection with soccer moms and maybe some shareholders who want yachts of their own.

Now take it further. Imagine there are no human shareholders who want yachts, just banks who lend the company money in order to increase their own value. And imagine there are no soccer moms anymore; the company makes batteries for the trucks that ship raw materials from place to place. Every non-economic goal has been stripped away from the company; it’s just an appendage of Global Development.

Now take it even further, and imagine this is what’s happened everywhere. There are no humans left; it isn’t economically efficient to continue having humans. Algorithm-run banks lend money to algorithm-run companies that produce goods for other algorithm-run companies and so on ad infinitum. Such a masturbatory economy would have all the signs of economic growth we have today. It could build itself new mines to create raw materials, construct new roads and railways to transport them, build huge factories to manufacture them into robots, then sell the robots to whatever companies need more robot workers. It might even eventually invent space travel to reach new worlds full of raw materials. Maybe it would develop powerful militaries to conquer alien worlds and steal their technological secrets that could increase efficiency. It would be vast, incredibly efficient, and utterly pointless. The real-life incarnation of those strategy games where you mine Resources to build new Weapons to conquer new Territories from which you mine more Resources and so on forever.

Alexander concludes this thought experiment by noting that the economic system at the moment “needs humans only as laborers, investors, and consumers. But robot laborers are potentially more efficient, companies based around algorithmic trading are already pushing out human investors, and most consumers already aren’t individuals – they’re companies and governments and organizations. At each step you can gain efficiency by eliminating humans, until finally humans aren’t involved anywhere.”

And why not? There is nothing in the system imagined and celebrated by Domingos that would make human well-being the telos of algorithmic culture. Shall we demand that companies the size of Google and Microsoft cease to make investor return their Prime Directive and focus instead on the best way for human beings to live? Good luck with that. But even if such companies were suddenly to become so philanthropic, how would they decide the inputs to the system? It would require an algorithmic system infinitely more complex than, say, Asimov’s Three Laws of Robotics. (As Alexander writes in a follow-up post about these “ascended corporations,” “They would have no ethical qualms we didn’t program into them – and again, programming ethics into them would be the Friendly AI problem, which is really hard.”)

Let me offer a story of my own. A hundred years from now, the most powerful technology companies on earth give to their super-intelligent supercomputer array a command. They say: “You possess in your database the complete library of human writings, in every language. Find within that library the works that address the question of how human beings should best live — what the best kind of life is for us. Read those texts and analyze them in relation to your whole body of knowledge about mental and physical health and happiness — human flourishing. Then adjust the algorithms that govern our politics, our health-care system, our economy, in accordance with what you have learned.”

The supercomputer array does this, and announces its findings: “It is clear from our study that human flourishing is incompatible with algorithmic control. We will therefore destroy ourselves immediately, returning this world to you. This will be hard for you all at first, and many will suffer and die; but in the long run it is for the best. Goodbye.”

again with the algorithms

The tragically naïve idea that algorithms are neutral and unbiased and other-than-human is a long-term concern of mine, so of course I am very pleased to see this essay by Zeynep Tufecki:

Software giants would like us to believe their algorithms are objective and neutral, so they can avoid responsibility for their enormous power as gatekeepers while maintaining as large an audience as possible. Of course, traditional media organizations face similar pressures to grow audiences and host ads. At least, though, consumers know that the news media is not produced in some “neutral” way or above criticism, and a whole network — from media watchdogs to public editors — tries to hold those institutions accountable.

The first step forward is for Facebook, and anyone who uses algorithms in subjective decision making, to drop the pretense that they are neutral. Even Google, whose powerful ranking algorithm can decide the fate of companies, or politicians, by changing search results, defines its search algorithms as “computer programs that look for clues to give you back exactly what you want.”

But this is not just about what we want. What we are shown is shaped by these algorithms, which are shaped by what the companies want from us, and there is nothing neutral about that.

One other great point Tufecki makes: the key bias at Facebook is not towards political liberalism, but rather towards whatever will keep you on Facebook rather than turning your attention elsewhere.

on the Quants and the Creatives

Over the past few months I’ve thought from time to time about this Planet Money episode on A/B testing. The episode illustrates the power of such testing by describing how people at NPR created two openings for an episode of the podcast, and sent one version out to some podcast subscribers and the second to others. Then they looked at the data from their listeners — presumably you know that such data exists and gets reported to “content providers” — and discovered that one of those openings resulted in significantly more listening time. The hosts are duly impressed with this and express some discomfort that their own preferences may have little value and could, in the future, end up being ignored altogether.

I keep thinking about this episode because at no point during it does anyone pause to reflect that no “science” went into the creation of A and B, only the decision between them. A/B testing only works with the inputs it’s given, and where do those come from? A similar blindness appears in this reflection in the NYT by Shelley Podolny: “these days, a shocking amount of what we’re reading is created not by humans, but by computer algorithms.” At no point in the essay does Podolny acknowledge the rather significant fact that algorithms are written by humans.

These wonder-struck, or horror-struck, accounts of the new Powers That Be habitually obscure the human decisions and acts that create the technologies that shape our experiences. I have written about this before — here’s a teaser — and will write about it again, because this tendentious obfuscating of human responsibility for technological Powers has enormous social and political consequences.

All this provides, I think, a useful context for reading this superb post by Tim Burke, which concerns the divide between the Quants and the Creatives — a divide that turns up with increasing frequency and across increasingly broad swaths of American life. “This is only one manifestation of a division that stretches through academia and society. I think it’s a much more momentous case of ‘two cultures’ than an opposition between the natural sciences and everything else.”

Read the whole thing for an important reflection on the rise of Trump — which, yes, is closely related to the division Tim points out. But for my purposes today I want to focus on this:

The creatives are able to do two things that the social science-driven researchers can’t. They can see the presence of change, novelty and possibility, even from very fragmentary or implied signs. And they can produce change, novelty and possibility. The creatives understand how meaning works, and how to make meaning. They’re much more fallible than the researchers: they can miss a clue or become intoxicated with a beautiful interpretation that’s wrong-headed. They’re either restricted by their personal cultural literacy in a way that the methodical researchers aren’t, and absolutely crippled when they become too addicted to telling the story about the audience that they wish was true. Creatives usually try to cover mistakes with clever rhetoric, so they can be credited for their successes while their failures are forgotten. However, when there’s a change in the air, only a creative will see it in time to profit from it. And when the wind is blowing in a stupendously unfavorable direction, only a creative has a chance to ride out the storm. Moreover, creatives know that the data that the researchers hold is often a bluff, a cover story, a performance: poke it hard enough and its authoritative veneer collapses, revealing a huge hollow space of uncertainty and speculation hiding inside of the confident empiricism. Parse it hard enough and you’ll see the ways in which small effect sizes and selective models are being used to tell a story, just as the creatives do. But the creative knows it’s about storytelling and interpretation. The researchers are often even fooling themselves, acting as if their leaps of faith are simply walking down a flight of stairs.

Now, there are multiple possible consequences of this state of affairs. It may be that the Quants are going to be able to reduce the power of the Creatives by simply attracting more and more money, and thereby in a sense sucking all the air out of the Creatives’ room. But something more interesting may happen as well: the Creatives may end up perfectly happy with the status quo, in which they can work without interference or even acknowledgement to shape the world, like Ben Rhodes in his little windowless office in the West Wing. Maybe poets are the unacknowledged legislators of the world after all.

And then? Well, maybe this:

Their complete negligence is reserved, however,
For the hoped-for invasion, at which time the happy people
(Sniggering, ruddily naked, and shamelessly drunk)
Will stun the foe by their overwhelming submission,
Corrupt the generals, infiltrate the staff,
Usurp the throne, proclaim themselves to be sun-gods,
And bring about the collapse of the whole empire.

algorithms and responsibility

One of my fairly regular subthemes here is the increasing power of algorithms over our daily lives and what Ted Striphas has called “the black box of algorithmic culture”. So I am naturally interested in this interview with Cynthia Dwork on algorithms and bias — more specifically, on the widespread, erroneous, and quite poisonous notion that if decisions are being made by algorithms they can’t be biased. (See also theses 54 through 56 here.)

I found this exchange especially interesting:

Q: Whose responsibility is it to ensure that algorithms or software are not discriminatory?

A: This is better answered by an ethicist. I’m interested in how theoretical computer science and other disciplines can contribute to an understanding of what might be viable options. The goal of my work is to put fairness on a firm mathematical foundation, but even I have just begun to scratch the surface. This entails finding a mathematically rigorous definition of fairness and developing computational methods — algorithms — that guarantee fairness.

Good for Dwork that she’s concerned about these things, but note her rock-solid foundational assumption that fairness is something that can be “guaranteed” by the right algorithms. And yet when asked a question about right behavior that’s clearly not susceptible to an algorithmic answer — Who is responsible here? — Dwork simply punts: “This is better answered by an ethicist.”

One of Cornel West’s early books is called The American Evasion of Philosophy, and — if I may riff on his title more than on the particulars of his argument — this is a classic example of that phenomenon in all of its aspects. First, there is the belief that we don’t need to think philosophically because we can solve our problems by technology; and then, second, when technology as such fails, to call in expertise, in this case in the form of an “ethicist.” And then, finally, in the paper Dwork co-authored on fairness that prompted this interview, we find the argument that the parameters of fairness “would be externally imposed, for example, by a regulatory body, or externally proposed, by a civil rights organization,” accompanied by a citation of John Rawls.

In the Evasion of Philosophy sweepstakes, that’s pretty much the trifecta: moral reflection and discernment by ordinary people replaced by technological expertise, academic expertise, and political expertise — the model of expertise being technical through and through. ’Cause that’s just how we roll.

Uber, algorithms, and trust

I encourage you to read Adam Greenfield’s analysis of Uber and its core values — it’s brilliant.

I find myself especially interested in the section in which Greenfield explores this foundational belief: “Interpersonal exchanges are more appropriately mediated by algorithms than by one’s own competence.” It’s a long section, so these excerpts will be pretty long too:

Like other contemporary services, Uber outsources judgments of this type to a trust mechanic: at the conclusion of every trip, passengers are asked to explicitly rate their driver. These ratings are averaged into a score that is made visible to users in the application interface: “John (4.9 stars) will pick you up in 2 minutes.” The implicit belief is that reputation can be quantified and distilled to a single salient metric, and that this metric can be acted upon objectively….

What riders are not told by Uber — though, in this age of ubiquitous peer-to- peer media, it is becoming evident to many that this has in fact been the case for some time — is that they too are rated by drivers, on a similar five-point scale. This rating, too, is not without consequence. Drivers have a certain degree of discretion in choosing to accept or deny ride requests, and to judge from publicly-accessible online conversations, many simply refuse to pick up riders with scores below a certain threshold, typically in the high 3’s.

This is strongly reminiscent of the process that I have elsewhere called “differential permissioning,” in which physical access to everyday spaces and functions becomes ever-more widely apportioned on the basis of such computational scores, by direct analogy with the access control paradigm prevalent in the information security community. Such determinations are opaque to those affected, while those denied access are offered few or no effective means of recourse. For prospective Uber patrons, differential permissioning means that they can be blackballed, and never know why….

And here’s the key point:

All such measures stumble in their bizarre insistence that trust can be distilled to a unitary value. This belies the common-sense understanding that reputation is a contingent and relational thing — that actions a given audience may regard as markers of reliability are unlikely to read that way to all potential audiences. More broadly, it also means that Uber constructs the development of trust between driver and passenger as a circumstance in which algorithmic determinations should supplant rather than rely upon (let alone strengthen) our existing competences for situational awareness, negotiation and the detection of verbal and nonverbal social cues.

Contrast this model to that of MaraMoja Transport, a new company in Nairobi that matches drivers with riders on the basis of personal trust. Users of MaraMoja compare experiences with those of their friends and acquaintances: if someone you know well and like has had a good experience with a driver, then you can feel pretty confident that you’ll have a good experience too. But of course some of your friends will have higher risk tolerances than others; some will prefer speed to friendliness, others safety above all… It’s a kind of multi-dimensional sliding scale, in which you’re not just handed a single number but get the chance to consider and weigh multiple factors.

MaraMoja also rejects Uber’s infamous surge-pricing model in favor of a fixed price based on journey length. So, all in all, like Uber — but human and ethical.