Learning about health and medicine from Internet data

March 18, 2015 Comments off

Source: Microsoft Research

Surveys show that around 70% of US Internet users consult the Internet when they require medical information. People seek this information using both traditional search engines and via social media. The information created using the search process offers an unprecedented opportunity for applications to monitor and improve the quality of life of people with a variety of medical conditions. In recent years, research in this area has addressed public-health questions such as the effect of media on development of anorexia, developed tools for measuring influenza rates and assessing drug safety, and examined the effects of health information on individual wellbeing. This tutorial will show how Internet data can facilitate medical research, providing an overview of the state-of-the-art in this area. During the tutorial we will discuss the information which can be gleaned from a variety of Internet data sources, including social media, search engines, and specialized medical websites. We will provide an overview of analysis methods used in recent literature, and show how results can be evaluated using publicly-available health information and online experimentation. Finally, we will discuss ethical and privacy issues and possible technological solutions. This tutorial is intended for researchers of user generated content who are interested in applying their knowledge to improve health and medicine.

Accessible Crowdwork? Understanding the Value in and Challenge of Microtask Employment for People with Disabilities

March 2, 2015 Comments off

Source: Microsoft Research

We present the first formal study of crowdworkers who have disabilities via in-depth open-ended interviews of 17 people (disabled crowdworkers and job coaches for people with disabilities) and a survey of 631 adults with disabilities. Our findings establish that people with a variety of disabilities currently participate in the crowd labor marketplace, despite challenges such as crowdsourcing workflow designs that inadvertently prohibit participation by, and may negatively affect the worker reputations of, people with disabilities. Despite such challenges, we find that crowdwork potentially offers different opportunities for people with disabilities relative to the normative office environment, such as job flexibility and lack of a need to rely on public transit. We close by identifying several ways in which crowd labor platform operators and/or individual task requestors could improve the accessibility of this increasingly important form of employment.

Supercomputers: The Amazing Race

January 13, 2015 Comments off

Source: Microsoft Research (Gordon Bell)

The “ideal supercomputer” has an infinitely fast clock, executes a single instruction stream program operating on data stored in an infinitely large, and fast single-memory. Backus established the von Neumann programming model with FORTRAN. Supercomputers have evolved in steps: increasing processor speed, processing vectors, adding processors for a program held in a single memory monocomputer; and interconnecting multiple computers over which a distributed program runs in parallel. Thus, supercomputing has evolved from a hardware engineering design challenge of the Cray Era(1960-1995) of the monocomputer to the challenging of creating programs that operate on distributed (mono)computers of the Multicomputer Era (1985- present).

Identifying Presentation Styles in Online Educational Videos

January 6, 2015 Comments off

Source: Microsoft Research

The rapid growth of online educational videos has resulted in huge redundancy. The same underlying content is often available in multiple videos with varying quality, presenter, and presentation style (slide show, whiteboard presentation, demo, etc). The fact that there are so many videos on the same content makes it important to retrieve videos that are attuned to user preferences. While there are several aspects that drive user engagement, we focus on the presentation style of the video. Based on a large scale manual study, we identify the 11 dominant presentation styles that typically employed. We propose a reference algorithm combining a set of 3-Way Decision Forests with probabilistic fusion and using a large set of image, face and motion features. We analyze our empirical results to provide understanding of the difficulties of the problem and to highlight directions for future research on this new application. We also make the data available.

Optimizing Human Computation to Save Time and Money

January 6, 2015 Comments off

Source: Microsoft Research

Crowd-sourcing is increasingly being used for providing answers to online polls and surveys. However, existing systems, while taking care of the mechanics of attracting crowd workers, poll building, and payment, generally provide little by way of cost-management (e.g. working with a tight budget), time-management (e.g. obtaining results as quickly as possible), and controlling the margin of error (e.g. working on a sample population which is largely different from the general census statistics). The problems above create significant pain points for those wanting to run large-scale surveys, such as people doing polling for political campaigns, marketing professionals, and the like.

Our work unlocks the possibility of large-scale polling on a budget though the use of novel optimization strategies. Our work, is based on InterPoll, a platform for programming crowdsourced polls. In this paper, we present three static and three runtime optimizations for InterPoll polls represented as LINQ queries. The former share some similarities for traditional compiler optimizations, while the latter borrow insight from databases and real-life polling strategies.

These optimizations lead to significant improvements in practice. In our experiments we observed tenfold savings in survey cost and time savings of as much as 20 hours for some of the queries.

Search and Breast Cancer: On Disruptive Shifts of Attention over Life Histories of an Illness

November 24, 2014 Comments off

Source: Microsoft Research

We seek to understand the evolving needs of people who are faced with a life-changing medical diagnosis based on analyses of queries extracted from an anonymized search query log. Focusing on breast cancer, we manually tag a set of Web searchers as showing disruptive shifts in focus of attention and long-term patterns of search behavior consistent with the diagnosis and treatment of breast cancer. We build and apply probabilistic classifiers to detect these searchers from multiple sessions and to detect the timing of diagnosis, using a variety of temporal and statistical features. We explore the changes in information-seeking over time before and after an inferred diagnosis of breast cancer by aligning multiple searchers by the likely time of diagnosis. We automatically identify 1700 candidate searchers with an estimated 90% precision, and we predict the day of diagnosis within 15 days with an 88% accuracy. We show that the geographic and demographic attributes of searchers identified with high probability are strongly correlated with ground truth of reported incidence rates. We then analyze the content of queries over time from searchers for whom diagnosis was predicted, using a detailed ontology of cancerrelated search terms. Our analysis reveals the rich temporal structure of the evolving queries of people likely diagnosed with breast cancer. Finally, we focus on subtypes of illness based on inferred stages of cancer and show clinically relevant dynamics of information seeking based on dominant stage expressed by searchers.

Turk-Life in India

November 19, 2014 Comments off

Source: Microsoft Research

Previous studies on Amazon Mechanical Turk (AMT), the most well-known marketplace for microtasks, show that the largest population of workers on AMT is U.S. based, while the second largest is based in India. In this paper, we present insights from an ethnographic study conducted in India to introduce some of these workers or ‘Turkers’ – who they are, how they work and what turking means to them. We examine the work they do to maintain their reputations and their work-life balance. In doing this, we illustrate how AMT’s design practically impacts on turk-work. Understanding the ‘lived work’ of crowdwork is a valuable first step for technology design.


