People make progress by discovering and sharing useful ideas. The value of an idea is proportional to the number of people who use it.

Spoken language gave humans a unique capacity for moving an idea from one person’s neurons to another’s. Writing makes it possible for one person to convey an idea to many others. If a document is well written, anyone with a copy can read it and convert the codified knowledge stored in text back into human capital stored in neurons.

Writing is the bottleneck that holds back the rate of diffusion of ideas. Writing is a production process that converts knowledge stored in neurons into knowledge codified as text. Writing clearly and concisely is a time consuming, multi-stage process that starts with composing and is followed by multiple rounds of editing, pruning, and user testing. The essence of this process is captured by the apology that “I would have written less but I didn’t have the time.”

The quality of written prose should be higher in documents that will have many readers. If an author devotes an extra hour to shortening and improving a text, this might save an additional minute for each reader. If there are even 100 readers, an extra hour of editing that saves 100 minutes of reading reduces the total time required for communication.

Writing clearly to build trust

There are many reasons why we must write clearly. The one that is relevant here is that clear writing is a commitment to integrity. It is, without exaggeration, the foundation for trust in science.

When a scientist with a reputation for integrity makes a claim, most lay readers will trust that it is correct. If the claim is stated clearly, it will be easy for some other scientist to verify whether it is true. If the assertion turns out not to be true, the scrutiny will ratchet up. In one recent case, a careful look revealed that the author fabricated data he claimed to have collected via a survey. This ended his career in science. Because of the personal cost that would follow if they were to make false statements that are uncovered, working scientists have an incentive to be scrupulous about telling the truth. This is why it makes sense for people to trust the precise claims that a scientist makes.

The problem with vague writing is that it lets an author convey a false impression yet retain plausible deniability when someone tries to verify the claim. This is the problem with the sentence from our memo which said, “The current perception of STC as cheaper than Term staff has to be questioned.”

To see how this plays out inside the Bank, consider this clearly stated claim that I have made up:

This statistical analysis demonstrates that policies that increase enrollment in higher education cause an increase in the rate of growth of GDP per capita.

Anyone familiar with the data knows that this claim is almost surely false. The statistical data that bear on this issue can reveal correlations but cannot establish causality. In science, making this claim would damage an author’s reputation.

In Bank writing, the way to convey the same message without risking ones reputation is to restate the claim vaguely and imprecisely. No consider this actual claim from unit in the Bank:

[The] higher education evaluation … confirms the importance of education as an input into growth.

Readers who are not paying careful attention might reach the same conclusion after reading the second claim or the first. Yet when challenged, the author of the second claim can accurately say that it might have a different meaning. The verb “confirm” can mean something other than the verb “demonstrates.” Because it is not clear what the phrase “the importance of education as an input for growth” means, the author, when challenged, can plausibly assert that it means something other than “education causes growth.” In short, no one can say that the author of the second claim wrote something that is false because no one knows what the second claim means.

DEC should be the part of the Bank that prevents the entire organization from using this type of vague persuasion. But we can be the voice that criticizes vague overstatement only if we are consistent in setting and meeting high standards for clarity of our prose and are willing to admit a mistake when we make one.

So to build trust, the highest priority for the Office of the Chief Economist will be to insist that any document that this office produces is written clearly, concisely, and is correct to the best of our knowledge. We may not be able to prevent other units from publishing poorly written documents, but nothing will prevent us from keeping track of the relative strength and weaknesses of all bank publications.

How information technology can help

When a linguist and a computer scientist looked at the Bank’s annual reports, they found patterns we can track. One is the use of “and” to offer something for everyone:

… promote corporate governance and competition policies and reform and privatize state-owned enterprises and labor market/social protection reform …

This phrase has 19 words if we count “state-owned” as two words and “market/social” as two words. The word “and” is repeated four times, so its frequency is 4/19 or 21%. In the Bank reports, the word “and” displaces “the” from its usual position as the most frequently used word in English. As this figure shows, the trend has been up.

Consider this second measure, the ratio of nouns to verbs.

Counting word frequencies is simple. Identifying parts of speech is not. Parsing sentences to identify subjects and predicates is even harder. But these challenges have all been solved. Anyone can download and use open source packages that do any of these tasks, and combine them with software libraries for machine learning.

This opens up interesting possibilities. My favorite book on editing recommends looking at the “the first seven or eight words in a sentence. If you do not see a character as a subject and a verb as a specific action, you have a candidate for a revision.” With existing software, we can find the nouns and verbs and identify the subjects. We can then use a standard type of machine learning algorithm (a classifier) to identify the difference between a phrase that has a character who takes a specific action–“The Prime Minister reorganized …”–and one that does not–“The reorganization took place …” Or “this program costs as much as the alternative …” versus “The current perception of STC as cheaper than Term staff has to be questioned.”

One way to have the algorithm learn is to have it flag sentences that it finds problematic and then observe what a skilled editor does. Among the many resources of the Bank, we have many writers and many editors. This is where people from ECR have offered to help.

To experiment with something like this, researchers in the Bank should be able to spin up a server in the cloud, download some open-source software and start experimenting, all within minutes. When I arrived, this was not possible because people in ITS did not trust people from DEC and, reading between the lines, were tired of dismissive arrogance that people from DEC displayed.