Skip to main content
Report this ad

See also:

New analysis contradicts findings published in Science

Do people (not working or studying in the field of health, technology, or science) really read those medical, scientific, medical, or technical journal articles tucked away in University libraries or sometimes on dusty shelves if there's no 'translation' into nontechnical language? The Language (journal) article by Richard Sproat, "A statistical comparison of written language and non-linguistic symbol systems," is published online in the June 2014 issue of Language. Richard Sproat is a Research Scientist at Google. The study is based on work he previously did at the Oregon Health and Science University. How many people read mainstream media news when it comes to health trends or scientific research, then read behind the words and check the researcher's reputation or the words used in the research?

Do most people understand health and science news in the media?
Anne Hart, photography.

Not all consumer news publications translate the technical language of science or medicine and healthcare or even psychology into plain language so the average reader/consumer who hasn't taken science courses can understand or interpret the findings and think about how the results of studies might apply to that individual's health. New analysis contradicts findings published in Science, says a June 2, 2014 news release, "New analysis contradicts findings published in Science."

The new research presents evidence that the methods employed by the authors of articles published in prestigious international science journals are not supported by a more rigorous linguistic analysis

Sproat's analysis comes in response to a number of papers published in high-profile science publications that have argued that statistical analyses of symbol combinations can provide insights into the origins of written language. One paper, by Rajesh Rao (University of Washington), Iravatham Mahadevan (Indus Research Centre) and colleagues at the TATA Institute in Mumbai, India, appeared in 2009 in the journal Science.

It argued that a particular statistical measure — bigram conditional entropy — showed that the Indus Valley symbols behave more like those in linguistic texts than those of non-linguistic systems. In another paper in the Proceedings of the Royal Society, Rob Lee and colleagues (University of Exeter) claimed that a more sophisticated set of entropic measures put Pictish symbols in the same category as linguistic texts.

Both papers (and other subsequent papers by Rao and his colleagues) received a large amount of attention from the news media. In these popular media accounts, the techniques were often presented as demonstrating that the symbol systems in question were written language, though this was not necessarily the intention of the authors. Do you think the average consumer would understand these reports from the news media that aim to make the complex appear more easy to follow and understand, sometimes in a step-by-step manner?

One of the areas is whether the media can understand statistical concepts enough to interpret them in simple, plain language for the average reader who has never taken a course in statistics or has forgotten certain types of math learned in high school after several decades, for example, if their job focused mainly on greeting people or teaching literature, cooking food, or driving a truck.

Would the average reader interested in his or her own healthcare understand statistical techniques when reading news media pieces on medicine?

Understanding statistical techniques for analyzing symbol systems and what they do and do not show is of fundamental importance to language science, as there are many old or ancient symbol systems whose function is largely or completely unknown. Examples include the Easter Island rongorongo inscriptions (19th century), the Pictish symbols of Scotland (6th century onwards), and the Indus Valley symbols (Northern India, Pakistan, 3rd millennium BCE).

As part of his work on the question of whether symbol systems such as these exemplify written language, Sproat developed large, structured collections of text, or corpora, from a variety of non-linguistic systems, both ancient and modern, including Mesopotamian deity symbols (Babylonia), Totem poles (Pacific Northwest), Pennsylvania barn stars ("hex signs"), weather forecast icon sequences from, and Unicode characters for Asian emoticons. He compared these to corpora developed from fourteen languages representing a variety of different writing-system types, both ancient and modern.

From the point of view of the measures that had been proposed in the previous literature, all of the non-linguistic symbol systems in Sproat's collection or corpora behaved the same as the linguistic systems. However, he also found that a novel measure of the amount of local repetition and a version of one of Lee and colleagues' entropic measures with a different setting than they used could accurately distinguish two different categories of symbol systems. Moreover, his statistical procedure, unlike the earlier ones, classifies both the Pictish and Indus Valley symbols as non-linguistic.

Despite these promising results, Sproat cautions against relying too heavily on statistical measures to analyze ancient symbol systems that have not been deciphered. All statistical measures are heavily influenced by, among other things, the size of the corpus, the length of texts, and what kind of text is involved.

Shopping lists have statistical properties

Shopping lists, for example, have statistical properties that distinguish them from running prose from a novel. He argues that a truly reliable demonstration that a collection of symbols exemplifies written language requires supporting empirical evidence, such as a credible decipherment or independent archeological evidence of a related culture of active literacy. What is clear, however, is that the previously proposed statistical methods simply do not work for the intended purpose.

Sproat's work was supported in part by the National Science Foundation under grant number BCS-1049308. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation, nor of Google. The Linguistic Society of America (LSA) publishes the peer-reviewed journal, Language, four times per year. The LSA is the largest national professional society representing the field of linguistics. Its mission is to advance the scientific study of language.

Report this ad