Archive for the ‘language and languages’ Category

The Voice of Emotion across Species: How Do Human Listeners Recognize Animals’ Affective States?

April 14, 2014 Comments off

The Voice of Emotion across Species: How Do Human Listeners Recognize Animals’ Affective States?
Source: PLoS ONE

Voice-induced cross-taxa emotional recognition is the ability to understand the emotional state of another species based on its voice. In the past, induced affective states, experience-dependent higher cognitive processes or cross-taxa universal acoustic coding and processing mechanisms have been discussed to underlie this ability in humans. The present study sets out to distinguish the influence of familiarity and phylogeny on voice-induced cross-taxa emotional perception in humans. For the first time, two perspectives are taken into account: the self- (i.e. emotional valence induced in the listener) versus the others-perspective (i.e. correct recognition of the emotional valence of the recording context). Twenty-eight male participants listened to 192 vocalizations of four different species (human infant, dog, chimpanzee and tree shrew). Stimuli were recorded either in an agonistic (negative emotional valence) or affiliative (positive emotional valence) context. Participants rated the emotional valence of the stimuli adopting self- and others-perspective by using a 5-point version of the Self-Assessment Manikin (SAM). Familiarity was assessed based on subjective rating, objective labelling of the respective stimuli and interaction time with the respective species. Participants reliably recognized the emotional valence of human voices, whereas the results for animal voices were mixed. The correct classification of animal voices depended on the listener’s familiarity with the species and the call type/recording context, whereas there was less influence of induced emotional states and phylogeny. Our results provide first evidence that explicit voice-induced cross-taxa emotional recognition in humans is shaped more by experience-dependent cognitive mechanisms than by induced affective states or cross-taxa universal acoustic coding and processing mechanisms.

About these ads

Finding That College Students Cluster in Majors Based on Differing Patterns of Spatial Visualization and Language Processing Speeds

April 14, 2014 Comments off

Finding That College Students Cluster in Majors Based on Differing Patterns of Spatial Visualization and Language Processing Speeds
Source: Sage Open

For over 30 years, researchers such as Eisenberg and McGinty have investigated the relationship between 3-D visualization skills and choice of college major. Results of the present study support the fact that science and math majors tend to do well on a measure of 3-D visualization. Going beyond these earlier studies, the present study investigated whether a measure of Rapid Automatic Naming of Objects—which is normally used to screen for elementary school students who might struggle with speech, language, literacy, and numeracy—would further differentiate the choice of majors by college students. Far more research needs to be conducted, but results indicated that college students differentially clustered in scatterplot quadrants defined by the two screening assessments. Furthermore, several of these clusters, plus a statistical multiplier, may lead to a new understanding of students with phonological processing differences, learning disabilities, and speech and language impairments.

Speaking of Corporate Social Responsibility

March 24, 2014 Comments off

Speaking of Corporate Social Responsibility
Source: Harvard Business School Working Papers

We argue that the language spoken by corporate decision makers influences their firms’ social responsibility and sustainability practices. Linguists suggest that obligatory future-time-reference (FTR) in a language reduces the psychological importance of the future. Prior research has shown that speakers of strong FTR languages (such as English, French, and Spanish) exhibit less future-oriented behavior (Chen, 2013). Yet, research has not established how this mechanism may affect the future-oriented activities of corporations. We theorize that companies with strong-FTR languages as their official/working language would have less of a future orientation and so perform worse in future-oriented activities such as corporate social responsibility (CSR) compared to those in weak-FTR language environments. Examining thousands of global companies across 59 countries from 1999 to 2011, we find support for our theory and further that the negative association between FTR and CSR performance is weaker for firms that have greater exposure to diverse global languages as a result of (a) being headquartered in countries with a higher degree of globalization, (b) having a higher degree of internationalization, and (c) having a CEO with more international experience. Our results suggest that language use by corporations is a key cultural variable that is a strong predictor of CSR and sustainability.

Predicting the Risk of Suicide by Analyzing the Text of Clinical Notes

February 6, 2014 Comments off

Predicting the Risk of Suicide by Analyzing the Text of Clinical Notes
Source: PLoS ONE

We developed linguistics-driven prediction models to estimate the risk of suicide. These models were generated from unstructured clinical notes taken from a national sample of U.S. Veterans Administration (VA) medical records. We created three matched cohorts: veterans who committed suicide, veterans who used mental health services and did not commit suicide, and veterans who did not use mental health services and did not commit suicide during the observation period (n = 70 in each group). From the clinical notes, we generated datasets of single keywords and multi-word phrases, and constructed prediction models using a machine-learning algorithm based on a genetic programming framework. The resulting inference accuracy was consistently 65% or more. Our data therefore suggests that computerized text analytics can be applied to unstructured medical records to estimate the risk of suicide. The resulting system could allow clinicians to potentially screen seemingly healthy patients at the primary care level, and to continuously evaluate the suicide risk among psychiatric patients.

“How Old Do You Think I Am?”: A Study of Language and Age in Twitter

January 15, 2014 Comments off

“How Old Do You Think I Am?”: A Study of Language and Age in Twitter (PDF)
Source: Association for the Advancement of Artificial Intelligence

In this paper we focus on the connection between age and language use, exploring age prediction of Twitter users based on their tweets. We discuss the construction of a fine-grained annotation effort to assign ages and life stages to Twitter users. Using this dataset, we explore age prediction in three different ways: classifying users into age categories, by life stages, and predicting their exact age. We find that an automatic system achieves better performance than humans on these tasks and that both humans and the automatic systems have difficul- ties predicting the age of older people. Moreover, we present a detailed analysis of variables that change with age. We find strong patterns of change, and that most changes occur at young ages.

Discussion in Postsecondary Classrooms

January 7, 2014 Comments off

Discussion in Postsecondary Classrooms
Source: SAGE Open

Spoken language is, arguably, the primary means by which teachers teach and students learn. Much of the literature on language in classrooms has focused on discussion that is seen as both a method of instruction and a curricular outcome. While much of the research on discussion has focused on K-12 classrooms, there is also a body of research examining the efficacy of discussion in postsecondary settings. This article provides a review of this literature in order to consider the effect of discussion on student learning in college and university classrooms, the prevalence of discussion in postsecondary settings, and the quality of discussion in these settings. In general, the results of research on the efficacy of discussion in postsecondary settings are mixed. More seriously, researchers have not been explicit about the meaning of discussion and much of what is called discussion in this body of research is merely recitation with minimal levels of student participation. Although the research on discussion in college and university classrooms is inconclusive, some implications can be drawn from this review of the research including the need for future researchers to clearly define what they mean by “discussion.”

Early Education for Dual Language Learners

November 25, 2013 Comments off

Early Education for Dual Language Learners (PDF)
Source: Migration Policy Institute

This report profiles the population of young Dual Language Learners (DLLs), who represent nearly one-third of all US children under age 6, outlining their school readiness and patterns of achievement. The report evaluates the research on early care and education approaches that have been shown to support higher levels of language and literacy development and achievement for this child population, most but not all of whom are children of immigrants. Assessing the features of high-quality programs that have been shown to improve school readiness among the DLL population, the author finds there are a number of readily implementable practices that can be put into effect.

Is “Huh?” a Universal Word? Conversational Infrastructure and the Convergent Evolution of Linguistic Items

November 18, 2013 Comments off

Is “Huh?” a Universal Word? Conversational Infrastructure and the Convergent Evolution of Linguistic Items
Source: PLoS ONE

A word like Huh?–used as a repair initiator when, for example, one has not clearly heard what someone just said– is found in roughly the same form and function in spoken languages across the globe. We investigate it in naturally occurring conversations in ten languages and present evidence and arguments for two distinct claims: that Huh? is universal, and that it is a word. In support of the first, we show that the similarities in form and function of this interjection across languages are much greater than expected by chance. In support of the second claim we show that it is a lexical, conventionalised form that has to be learnt, unlike grunts or emotional cries. We discuss possible reasons for the cross-linguistic similarity and propose an account in terms of convergent evolution. Huh? is a universal word not because it is innate but because it is shaped by selective pressures in an interactional environment that all languages share: that of other-initiated repair. Our proposal enhances evolutionary models of language change by suggesting that conversational infrastructure can drive the convergent cultural evolution of linguistic items.

The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets

September 11, 2013 Comments off

The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets (PDF)
Source: American Economic Review

Languages differ widely in the ways they encode time. I test the hypothesis that languages that grammatically associate the future and the present, foster future-oriented behavior. This prediction arises naturally when well-documented effects of language structure are merged with models of intertemporal choice. Empirically, I find that speakers of such languages: save more, retire with more wealth, smoke less, practice safer sex, and are less obese. This holds both across countries and within countries when comparing demographically similar native households. The evidence does not support the most obvious forms of common causation. I discuss implications for theories of intertemporal choice.

CA — Access to Justice in Both Official Languages: Improving the Bilingual Capacity of the Superior Court Judiciary

August 23, 2013 Comments off

Access to Justice in Both Official Languages: Improving the Bilingual Capacity of the Superior Court Judiciary
Source: Office of the Commissioner of Official Languages

For Canadians who are members of official language minority communities to feel comfortable using the official language of their choice before the superior courts, it is crucial for these courts to be able to offer all their services and to function in English and in French. In this regard, the bilingual capacity of the judiciary for superior courts is a sine qua non condition for access to the Canadian justice system in both official languages and ensuring the rights of litigants are not prejudiced by their language choice.

For superior courts and courts of appeal to be able to respect the language rights of litigants, it is therefore essential for the federal Minister of Justice to appoint an appropriate number of bilingual judges with the language skills necessary to preside over cases in the minority official language. Currently, the institutional bilingual capacity of the superior courts remains a challenge in a number of provinces and territories. Another challenge lies in judges’ ability to maintain their language skills at a level that is sufficient to preside over a hearing in their second official language.

The Commissioner of Official Languages of Canada, in partnership with François Boileau, the French Language Services Commissioner of Ontario and Michel Carrier, the Commissioner of Official Languages for New Brunswick, decided in 2012 to conduct an in-depth study on two issues that have an impact on the bilingual capacity of superior court judges: the judicial appointment process and the language training available to judges appointed to superior courts.
The study looked at the appointment processes for the superior courts of six provinces: Nova Scotia, New Brunswick, Quebec, Ontario, Manitoba and Alberta. It also took into account certain practices for appointing provincial judges in New Brunswick, Quebec, Ontario and Manitoba.

From the consultations conducted as part of the study, it was determined that the judicial appointment process does not guarantee sufficient bilingual capacity among the judiciary to respect the language rights of Canadians at all times.

New Census Bureau Interactive Map Shows Languages Spoken in America

August 16, 2013 Comments off

New Census Bureau Interactive Map Shows Languages Spoken in America
Source: U.S. Census Bureau

The U.S. Census Bureau today released an interactive, online map pinpointing the wide array of languages spoken in homes across the nation, along with a detailed report on rates of English proficiency and the growing number of speakers of other languages.

The 2011 Language Mapper shows where people speaking specific languages other than English live, with dots representing how many people speak each of 15 different languages. For each language, the mapper shows the concentration of those who report that they speak English less than “very well,” a measure of English proficiency. The tool uses data collected through the American Community Survey from 2007 to 2011.

Also released today, the report, Language Use in the United States: 2011, [PDF] details the number of people speaking languages other than English at home and their ability to speak English, by selected social and demographic characteristics. It shows that more than half (58 percent) of U.S. residents 5 and older who speak a language other than English at home also speak English “very well.” The data, taken from the American Community Survey, are provided for the nation, states and metropolitan and micropolitan areas.

The report shows that the percent speaking English “less than very well” grew from 8.1 percent in 2000 to 8.7 percent in 2007, but stayed at 8.7 percent in 2011. The percent speaking a language other than English at home went from 17.9 percent in 2000 to 19.7 percent in 2007, while continuing upward to 20.8 percent in 2011.

New From the GAO

August 15, 2013 Comments off

New GAO Report
Source: Government Accountability Office

Education Needs to Further Examine Data Collection on English Language Learners in Charter Schools. GAO-13-655R, July 17.

Gentrifier? Who, Me? Interrogating the Gentrifier in the Mirror

August 8, 2013 Comments off

Gentrifier? Who, Me? Interrogating the Gentrifier in the Mirror
Source: International Journal of Urban and Regional Research

Schlichtman and Patch suggest that there is an elephant sitting in the academic corner: while urbanists often use ‘gentrification’ as a pejorative term in formal and informal academic conversation, many urbanists are gentrifiers themselves. Even though urbanists have this firsthand experience with the process, this familiarity makes little impact on scholarly debate. There is, Schlichtman and Patch argue, an artificial distance in accounts of gentrification because researchers have not adequately examined their own relationship to the process. Utilizing a simple diagnostic tool that includes ten common aspects of gentrification, they compose two autoethnographic memoirs to begin this dialogue.

Building Large Collections of Chinese and English Medical Terms from Semi-Structured and Encyclopedia Websites

July 30, 2013 Comments off

Building Large Collections of Chinese and English Medical Terms from Semi-Structured and Encyclopedia Websites
Source: Microsoft Research

To build large collections of medical terms from semi-structured information sources (e.g. tables, lists, etc.) and encyclopedia sites on the web. The terms are classified into the three semantic categories, Medical Problems, Medications, and Medical Tests, which were used in i2b2 challenge tasks. We developed two systems, one for Chinese and another for English terms. The two systems share the same methodology and use the same software with minimum language dependent parts. We produced large collections of terms by exploiting billions of semi-structured information sources and encyclopedia sites on the Web. The standard performance metric of recall (R) is extended to three different types of Recall to take the surface variability of terms into consideration. They are Surface Recall (), Object Recall (), and Surface Head recall (). We use two test sets for Chinese. For English, we use a collection of terms in the 2010 i2b2 text. Two collections of terms, one for English and the other for Chinese, have been created. The terms in these collections are classified as either of Medical Problems, Medications, or Medical Tests in the i2b2 challenge tasks. The English collection contains 49,249 (Problems), 89,591 (Medications) and 25,107 (Tests) terms, while the Chinese one contains 66,780 (Problems), 101,025 (Medications), and 15,032 (Tests) terms. The proposed method of constructing a large collection of medical terms is both efficient and effective, and, most of all, independent of language. The collections will be made publicly available.

Mining Acronym Expansions and Their Meanings Using Query Click Log

June 4, 2013 Comments off

Mining Acronym Expansions and Their Meanings Using Query Click Log

Source: Microsoft Research

Acronyms are abbreviations formed from the initial components of words or phrases. Acronym usage is becoming more common in web searches, email, text messages, tweets, blogs and posts. Acronyms are typically ambiguous and often disambiguated by context words. Given either just an acronym as a query or an acronym with a few context words, it is immensely useful for a search engine to know the most likely intended meanings, ranked by their likelihood. To support such online scenarios, we study the offine mining of acronyms and their meanings in this paper. For each acronym, our goal is to discover all distinct meanings and for each meaning, compute the expanded string, its popularity score and a set of context words that indicate this meaning. Existing approaches are inadequate for this purpose. Our main insight is to leverage "co-clicks" in search engine query click log to mine expansions of acronyms. There are several technical challenges such as ensuring 1:1 mapping between expansions and meanings, handling of "tail meanings" and extracting context words. We present a novel, end-to-end solution that addresses the above challenges. We further describe how web search engines can leverage the mined information for prediction of intended meaning for queries containing acronyms. Our experiments show that our approach (i) discovers the meanings of acronyms with high precision and recall (ii) significantly complements existing meanings in Wikipedia and (iii) accurately predicts intended meaning for online queries with over 90% precision.

Word Diffusion and Climate Science

January 28, 2013 Comments off

Word Diffusion and Climate Science

Source: PLoS ONE

As public and political debates often demonstrate, a substantial disjoint can exist between the findings of science and the impact it has on the public. Using climate-change science as a case example, we reconsider the role of scientists in the information-dissemination process, our hypothesis being that important keywords used in climate science follow “boom and bust” fashion cycles in public usage. Representing this public usage through extraordinary new data on word frequencies in books published up to the year 2008, we show that a classic two-parameter social-diffusion model closely fits the comings and goings of many keywords over generational or longer time scales. We suggest that the fashions of word usage contributes an empirical, possibly regular, correlate to the impact of climate science on society.

See: Public acceptance of climate change affected by word usage (EurekAlert!)

Lake Superior State University 2013 Banished Words List

December 31, 2012 Comments off

Lake Superior State University 2013 Banished Words List

Source: Lake Superior State University

While the U.S. Congress has been kicking the can down the road and inching closer to the fiscal cliff, the word gurus at Lake Superior State University have doubled-down on their passion for the language and have released their 38th annual List of Words to be Banished from the Queen’s English for Misuse, Overuse and General Uselessness.

The list, compiled from nominations sent to LSSU throughout the year, is released each year on New Year’s Eve. It dates back to Dec. 31, 1975, when former LSSU Public Relations Director Bill Rabe (RAY-bee) and some colleagues cooked up the whimsical idea to banish overused words and phrases from the language. They issued the first list on New Year’s Day 1976. Much to the delight of word enthusiasts everywhere, the list has stayed the course into a fourth decade.

Through the years, LSSU has received tens of thousands of nominations for the list, which is closing in on its 1,000th banishment.

This year’s list is culled from nominations received mostly through the university’s website. Word-watchers target pet peeves from everyday speech, as well as from the news, fields of education, technology, advertising, politics and more. A committee makes a final cut in late December.

So, let’s see what’s trending. Grab your favorite superfood (boneless wings) as the list creators at LSSU reveal (spoiler alert!) their bucket list of misused, overused and generally useless words and phrases. YOLO!

Increases in Individualistic Words and Phrases in American Books, 1960–2008

July 20, 2012 Comments off

Increases in Individualistic Words and Phrases in American Books, 1960–2008
Source: PLoS ONE

Cultural products such as song lyrics, television shows, and books reveal cultural differences, including cultural change over time. Two studies examine changes in the use of individualistic words (Study 1) and phrases (Study 2) in the Google Books Ngram corpus of millions of books in American English. Current samples from the general population generated and rated lists of individualistic words and phrases (e.g., “unique,” “personalize,” “self,” “all about me,” “I am special,” “I’m the best”). Individualistic words and phrases increased in use between 1960 and 2008, even when controlling for changes in communal words and phrases. Language in American books has become increasingly focused on the self and uniqueness in the decades since 1960.

Dating the Origin of Language Using Phonemic Diversity

June 18, 2012 Comments off

Dating the Origin of Language Using Phonemic Diversity
Source: PLoS ONE

Language is a key adaptation of our species, yet we do not know when it evolved. Here, we use data on language phonemic diversity to estimate a minimum date for the origin of language. We take advantage of the fact that phonemic diversity evolves slowly and use it as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. We use a natural experiment, the colonization of Southeast Asia and Andaman Islands, to estimate the rate at which phonemic diversity increases through time. Using this rate, we estimate that present-day languages date back to the Middle Stone Age in Africa. Our analysis is consistent with the archaeological evidence suggesting that complex human behavior evolved during the Middle Stone Age in Africa, and does not support the view that language is a recent adaptation that has sparked the dispersal of humans out of Africa. While some of our assumptions require testing and our results rely at present on a single case-study, our analysis constitutes the first estimate of when language evolved that is directly based on linguistic data.

A Comparative Study of PDF Generation Methods: Measuring Loss of Fidelity When Converting Arabic and Persian MS Word Files to PDF

June 9, 2011 Comments off

A Comparative Study of PDF Generation Methods: Measuring Loss of Fidelity When Converting Arabic and Persian MS Word Files to PDF
Source: Mitre Corporation

Converting files to Portable Document Format (PDF) is popular due to the format’s many advantages. For example, PDF allows an author to control or preserve the rendering of a digital document, distribute it to other systems, and ensure that it displays in a viewer as intended.

From the perspective of Human Language Technology (HLT), however, PDFs are problematic. PDF is a display-oriented digital document format; the point of PDF is to preserve the appearance of a document, not to preserve the original electronic text. We observed errors in PDF-extracted text indicating that either the PDF generator or extractor, or both, mishandled the document structure, character data, and/or entire textual objects. And we learned that other HLT researchers reported data loss when extracting electronic text from PDFs. This motivated further study of digital document data exchange using PDFs.

MITRE conducted an exploratory study of data exchange using PDF in order to investigate the data loss phenomenon. We limited our study to Middle Eastern electronic text: specifically Arabic and Persian. The study included a test for scoring PDF generation methods—(a) using a common, best-practice setup to generate PDFs and extract text, and (b) using character accuracy to quantify the quality of PDF-extracted text. We ranked 8 methods according to the resulting accuracy scores. The 8 methods map to 3 core PDF generation classes. At best, the Microsoft Word class resulted in 42% Overall Accuracy. Best scores for the PDFMaker and Acrobat Distiller/PScript5.dll classes were 95% and 96%, respectively.

This paper explains our tests and discusses the results, including evidence that using PDF for data exchange of typical Arabic and Persian documents results in a loss of important electronic text content. This loss confuses human language technologies such as search engines, machine translation engines, computer-assisted translation tools, named entity recognizers, and information extractors.

Furthermore, most of the spurious newlines, spurious spaces in tokens, spurious character substitutions, and entity errors observed in the study were due to the PDF generation method, rather than the PDF text extractor. So, using a common configuration to convert reliable electronic text to PDF for data exchange causes irretrievable loss of electronic text on the receiving end.

+ Full Paper (PDF)


Get every new post delivered to your Inbox.

Join 784 other followers