Voice-induced cross-taxa emotional recognition is the ability to understand the emotional state of another species based on its voice. In the past, induced affective states, experience-dependent higher cognitive processes or cross-taxa universal acoustic coding and processing mechanisms have been discussed to underlie this ability in humans. The present study sets out to distinguish the influence of familiarity and phylogeny on voice-induced cross-taxa emotional perception in humans. For the first time, two perspectives are taken into account: the self- (i.e. emotional valence induced in the listener) versus the others-perspective (i.e. correct recognition of the emotional valence of the recording context). Twenty-eight male participants listened to 192 vocalizations of four different species (human infant, dog, chimpanzee and tree shrew). Stimuli were recorded either in an agonistic (negative emotional valence) or affiliative (positive emotional valence) context. Participants rated the emotional valence of the stimuli adopting self- and others-perspective by using a 5-point version of the Self-Assessment Manikin (SAM). Familiarity was assessed based on subjective rating, objective labelling of the respective stimuli and interaction time with the respective species. Participants reliably recognized the emotional valence of human voices, whereas the results for animal voices were mixed. The correct classification of animal voices depended on the listener’s familiarity with the species and the call type/recording context, whereas there was less influence of induced emotional states and phylogeny. Our results provide first evidence that explicit voice-induced cross-taxa emotional recognition in humans is shaped more by experience-dependent cognitive mechanisms than by induced affective states or cross-taxa universal acoustic coding and processing mechanisms.
Finding That College Students Cluster in Majors Based on Differing Patterns of Spatial Visualization and Language Processing Speeds
For over 30 years, researchers such as Eisenberg and McGinty have investigated the relationship between 3-D visualization skills and choice of college major. Results of the present study support the fact that science and math majors tend to do well on a measure of 3-D visualization. Going beyond these earlier studies, the present study investigated whether a measure of Rapid Automatic Naming of Objects—which is normally used to screen for elementary school students who might struggle with speech, language, literacy, and numeracy—would further differentiate the choice of majors by college students. Far more research needs to be conducted, but results indicated that college students differentially clustered in scatterplot quadrants defined by the two screening assessments. Furthermore, several of these clusters, plus a statistical multiplier, may lead to a new understanding of students with phonological processing differences, learning disabilities, and speech and language impairments.
We developed linguistics-driven prediction models to estimate the risk of suicide. These models were generated from unstructured clinical notes taken from a national sample of U.S. Veterans Administration (VA) medical records. We created three matched cohorts: veterans who committed suicide, veterans who used mental health services and did not commit suicide, and veterans who did not use mental health services and did not commit suicide during the observation period (n = 70 in each group). From the clinical notes, we generated datasets of single keywords and multi-word phrases, and constructed prediction models using a machine-learning algorithm based on a genetic programming framework. The resulting inference accuracy was consistently 65% or more. Our data therefore suggests that computerized text analytics can be applied to unstructured medical records to estimate the risk of suicide. The resulting system could allow clinicians to potentially screen seemingly healthy patients at the primary care level, and to continuously evaluate the suicide risk among psychiatric patients.
Discussion in Postsecondary Classrooms
Source: SAGE Open
Spoken language is, arguably, the primary means by which teachers teach and students learn. Much of the literature on language in classrooms has focused on discussion that is seen as both a method of instruction and a curricular outcome. While much of the research on discussion has focused on K-12 classrooms, there is also a body of research examining the efficacy of discussion in postsecondary settings. This article provides a review of this literature in order to consider the effect of discussion on student learning in college and university classrooms, the prevalence of discussion in postsecondary settings, and the quality of discussion in these settings. In general, the results of research on the efficacy of discussion in postsecondary settings are mixed. More seriously, researchers have not been explicit about the meaning of discussion and much of what is called discussion in this body of research is merely recitation with minimal levels of student participation. Although the research on discussion in college and university classrooms is inconclusive, some implications can be drawn from this review of the research including the need for future researchers to clearly define what they mean by “discussion.”
Early Education for Dual Language Learners (PDF)
Source: Migration Policy Institute
This report profiles the population of young Dual Language Learners (DLLs), who represent nearly one-third of all US children under age 6, outlining their school readiness and patterns of achievement. The report evaluates the research on early care and education approaches that have been shown to support higher levels of language and literacy development and achievement for this child population, most but not all of whom are children of immigrants. Assessing the features of high-quality programs that have been shown to improve school readiness among the DLL population, the author finds there are a number of readily implementable practices that can be put into effect.
New GAO Report
Source: Government Accountability Office
Education Needs to Further Examine Data Collection on English Language Learners in Charter Schools. GAO-13-655R, July 17.
Building Large Collections of Chinese and English Medical Terms from Semi-Structured and Encyclopedia Websites
To build large collections of medical terms from semi-structured information sources (e.g. tables, lists, etc.) and encyclopedia sites on the web. The terms are classified into the three semantic categories, Medical Problems, Medications, and Medical Tests, which were used in i2b2 challenge tasks. We developed two systems, one for Chinese and another for English terms. The two systems share the same methodology and use the same software with minimum language dependent parts. We produced large collections of terms by exploiting billions of semi-structured information sources and encyclopedia sites on the Web. The standard performance metric of recall (R) is extended to three different types of Recall to take the surface variability of terms into consideration. They are Surface Recall (), Object Recall (), and Surface Head recall (). We use two test sets for Chinese. For English, we use a collection of terms in the 2010 i2b2 text. Two collections of terms, one for English and the other for Chinese, have been created. The terms in these collections are classified as either of Medical Problems, Medications, or Medical Tests in the i2b2 challenge tasks. The English collection contains 49,249 (Problems), 89,591 (Medications) and 25,107 (Tests) terms, while the Chinese one contains 66,780 (Problems), 101,025 (Medications), and 15,032 (Tests) terms. The proposed method of constructing a large collection of medical terms is both efficient and effective, and, most of all, independent of language. The collections will be made publicly available.
Source: Microsoft Research
Acronyms are abbreviations formed from the initial components of words or phrases. Acronym usage is becoming more common in web searches, email, text messages, tweets, blogs and posts. Acronyms are typically ambiguous and often disambiguated by context words. Given either just an acronym as a query or an acronym with a few context words, it is immensely useful for a search engine to know the most likely intended meanings, ranked by their likelihood. To support such online scenarios, we study the offine mining of acronyms and their meanings in this paper. For each acronym, our goal is to discover all distinct meanings and for each meaning, compute the expanded string, its popularity score and a set of context words that indicate this meaning. Existing approaches are inadequate for this purpose. Our main insight is to leverage "co-clicks" in search engine query click log to mine expansions of acronyms. There are several technical challenges such as ensuring 1:1 mapping between expansions and meanings, handling of "tail meanings" and extracting context words. We present a novel, end-to-end solution that addresses the above challenges. We further describe how web search engines can leverage the mined information for prediction of intended meaning for queries containing acronyms. Our experiments show that our approach (i) discovers the meanings of acronyms with high precision and recall (ii) significantly complements existing meanings in Wikipedia and (iii) accurately predicts intended meaning for online queries with over 90% precision.
Dating the Origin of Language Using Phonemic Diversity
Source: PLoS ONE
Language is a key adaptation of our species, yet we do not know when it evolved. Here, we use data on language phonemic diversity to estimate a minimum date for the origin of language. We take advantage of the fact that phonemic diversity evolves slowly and use it as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. We use a natural experiment, the colonization of Southeast Asia and Andaman Islands, to estimate the rate at which phonemic diversity increases through time. Using this rate, we estimate that present-day languages date back to the Middle Stone Age in Africa. Our analysis is consistent with the archaeological evidence suggesting that complex human behavior evolved during the Middle Stone Age in Africa, and does not support the view that language is a recent adaptation that has sparked the dispersal of humans out of Africa. While some of our assumptions require testing and our results rely at present on a single case-study, our analysis constitutes the first estimate of when language evolved that is directly based on linguistic data.
A Comparative Study of PDF Generation Methods: Measuring Loss of Fidelity When Converting Arabic and Persian MS Word Files to PDF
Converting files to Portable Document Format (PDF) is popular due to the format’s many advantages. For example, PDF allows an author to control or preserve the rendering of a digital document, distribute it to other systems, and ensure that it displays in a viewer as intended.
From the perspective of Human Language Technology (HLT), however, PDFs are problematic. PDF is a display-oriented digital document format; the point of PDF is to preserve the appearance of a document, not to preserve the original electronic text. We observed errors in PDF-extracted text indicating that either the PDF generator or extractor, or both, mishandled the document structure, character data, and/or entire textual objects. And we learned that other HLT researchers reported data loss when extracting electronic text from PDFs. This motivated further study of digital document data exchange using PDFs.
MITRE conducted an exploratory study of data exchange using PDF in order to investigate the data loss phenomenon. We limited our study to Middle Eastern electronic text: specifically Arabic and Persian. The study included a test for scoring PDF generation methods—(a) using a common, best-practice setup to generate PDFs and extract text, and (b) using character accuracy to quantify the quality of PDF-extracted text. We ranked 8 methods according to the resulting accuracy scores. The 8 methods map to 3 core PDF generation classes. At best, the Microsoft Word class resulted in 42% Overall Accuracy. Best scores for the PDFMaker and Acrobat Distiller/PScript5.dll classes were 95% and 96%, respectively.
This paper explains our tests and discusses the results, including evidence that using PDF for data exchange of typical Arabic and Persian documents results in a loss of important electronic text content. This loss confuses human language technologies such as search engines, machine translation engines, computer-assisted translation tools, named entity recognizers, and information extractors.
Furthermore, most of the spurious newlines, spurious spaces in tokens, spurious character substitutions, and entity errors observed in the study were due to the PDF generation method, rather than the PDF text extractor. So, using a common configuration to convert reliable electronic text to PDF for data exchange causes irretrievable loss of electronic text on the receiving end.
+ Full Paper (PDF)