Archive for the ‘Microsoft Research’ Category

Search and Breast Cancer: On Disruptive Shifts of Attention over Life Histories of an Illness

November 24, 2014 Comments off

Search and Breast Cancer: On Disruptive Shifts of Attention over Life Histories of an Illness
Source: Microsoft Research

We seek to understand the evolving needs of people who are faced with a life-changing medical diagnosis based on analyses of queries extracted from an anonymized search query log. Focusing on breast cancer, we manually tag a set of Web searchers as showing disruptive shifts in focus of attention and long-term patterns of search behavior consistent with the diagnosis and treatment of breast cancer. We build and apply probabilistic classifiers to detect these searchers from multiple sessions and to detect the timing of diagnosis, using a variety of temporal and statistical features. We explore the changes in information-seeking over time before and after an inferred diagnosis of breast cancer by aligning multiple searchers by the likely time of diagnosis. We automatically identify 1700 candidate searchers with an estimated 90% precision, and we predict the day of diagnosis within 15 days with an 88% accuracy. We show that the geographic and demographic attributes of searchers identified with high probability are strongly correlated with ground truth of reported incidence rates. We then analyze the content of queries over time from searchers for whom diagnosis was predicted, using a detailed ontology of cancerrelated search terms. Our analysis reveals the rich temporal structure of the evolving queries of people likely diagnosed with breast cancer. Finally, we focus on subtypes of illness based on inferred stages of cancer and show clinically relevant dynamics of information seeking based on dominant stage expressed by searchers.

About these ads

Turk-Life in India

November 19, 2014 Comments off

Turk-Life in India
Source: Microsoft Research

Previous studies on Amazon Mechanical Turk (AMT), the most well-known marketplace for microtasks, show that the largest population of workers on AMT is U.S. based, while the second largest is based in India. In this paper, we present insights from an ethnographic study conducted in India to introduce some of these workers or ‘Turkers’ – who they are, how they work and what turking means to them. We examine the work they do to maintain their reputations and their work-life balance. In doing this, we illustrate how AMT’s design practically impacts on turk-work. Understanding the ‘lived work’ of crowdwork is a valuable first step for technology design.

Urban Computing: Concepts, Methodologies, and Applications

November 10, 2014 Comments off

Urban Computing: Concepts, Methodologies, and Applications
Source: Microsoft Research

Urbanization’s rapid progress has modernized many people’s lives, and also engendered big issues, such as traffic congestion, energy consumption, and pollution. Urban computing aims to tackle these issues by using the data that has been generated in cities, e.g., traffic flow, human mobility and geographical data. Urban computing connects urban sensing, data management, data analytics, and service providing into a recurrent process for an unobtrusive and continuous improvement of people’s lives, city operation systems, and the environment. Urban computing is an interdisciplinary field where computer sciences meet conventional city-related fields, like transportation, civil engineering, environment, economy, ecology, and sociology, in the context of urban spaces. This article first introduces the concept of urban computing, discussing its general framework and key challenges from the perspective of computer sciences. Secondly, we classify the applications of urban computing into seven categories, consisting of urban planning, transportation, the environment, energy, social, economy, and public safety & security, presenting representative scenarios in each category. Thirdly, we summarize the typical technologies that are needed in urban computing into four folds, which are about urban sensing, urban data management, knowledge fusion across heterogeneous data, and urban data visualization. Finally, we outlook the future of urban computing, suggesting a few research topics that are somehow missing in the community.

Privacy Considerations for a Pervasive Eye Tracking World

November 5, 2014 Comments off

Privacy Considerations for a Pervasive Eye Tracking World
Source: Microsoft Research

Multiple vendors now provide relatively inexpensive desktop eye and gaze tracking devices. With miniatureization and decreasing manufacturing costs, gaze trackers will follow the path of webcams, becoming ubiquitous and inviting many of the same privacy concerns. However, whereas the privacy loss from webcams may be obvious to the user, gaze tracking is more opaque and deserves special attention. In this paper, we review current research in gaze tracking and pupillometry and argue that gaze data should be protected by both policy and good data hygiene.

Ranking Twitter Discussion Groups

November 4, 2014 Comments off

Ranking Twitter Discussion Groups
Source: Microsoft Research

A discussion group is a repeated, synchronized conversation organized around a specific topic. Groups are extremely valuable to the attendees, creating a sense of community among like-minded users. While groups may involve many users, there are many outside the group that would benefit from participation. However, finding the right group is not easy given their quantity and given topic overlap. We study the following problem: given a search query, find a good ranking of discussion groups. We describe a random walk model for how users select groups: starting with a group relevant to the query, a hypothetical user repeatedly selects an authoritative user in the group and then moves to a group according to what the authoritative user prefers. The stationary distribution of this walk yields a group ranking. We analyze this random walk model, demonstrating that it enjoys many natural properties of a desirable ranking algorithm. We study groups on Twitter where conversations can be organized via pre-designated hashtags. These groups are an emerging phenomenon and there are at least tens of thousands in existence today according to our calculations. Via an extensive collection of experiments on one year of tweets, we show that our model effectively ranks groups, outperforming several baseline solutions.

Structured Information Extraction from Natural Disaster Events on Twitter

September 29, 2014 Comments off

Structured Information Extraction from Natural Disaster Events on Twitter
Source: Microsoft Research

As soon as natural disaster events happen, users are eager to know more about them. However, search engines currently provide a ten blue links interface for queries related to such events. Relevance of results for such queries can be significantly improved if users are shown a structured summary of the fresh events related to such queries. This would not just reduce the number of user clicks to get the relevant information but would also help users get updated with more fine grained attribute-level information.

Twitter is a great source that can be exploited for obtaining such fine-grained structured information for fresh natural disaster events. Such events are often reported on Twitter much earlier than on other news media. However, extracting such structured information from tweets is challenging because: 1. tweets are noisy and ambiguous; 2. there is no well defined schema for various types of natural disaster events; 3. it is not trivial to extract attribute-value pairs and facts from unstructured text; and 4. it is difficult to find good mappings between extracted attributes and attributes in the event schema.

We propose algorithms to extract attribute-value pairs, and also devise novel mechanisms to map such pairs to manually generated schemas for natural disaster events. Besides the tweet text, we also leverage text from URL links in the tweets to fill such schemas. Our schemas are temporal in nature and the values are updated whenever fresh information flows in from human sensors on Twitter. Evaluation on ∼58000 tweets for 20 events shows that our system can fill such event schemas with an F1 of ∼0.6.

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

September 26, 2014 Comments off

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
Source: Microsoft Research

We aim to provide table answers to keyword queries using a knowledge base. For queries referring to multiple entities, like “Washington cities population” and “Mel Gibson movies”, it is better to represent each relevant answer as a table which aggregates a set of entities or joins of entities within the same table scheme or pattern. In this paper, we study how to find highly relevant patterns in a knowledge base for user-given keyword queries to compose table answers. A knowledge base is modeled as a directed graph called knowledge graph, where nodes represent its entities and edges represent the relationships among them. Each node/edge is labeled with type and text. A pattern is an aggregation of subtrees which contain all keywords in the texts and have the same structure and types on node/edges. We propose efficient algorithms to find patterns that are relevant to the query for a class of scoring functions. We show the hardness of the problem in theory, and propose pathbased indexes that are affordable in memory. Two query-processing algorithms are proposed: one is fast in practice for small queries (with small numbers of patterns as answers) by utilizing the indexes; and the other one is better in theory, with running time linear in the sizes of indexes and answers, which can handle large queries better. We also conduct extensive experimental study to compare our approaches with a naive adaption of known techniques.


Get every new post delivered to your Inbox.

Join 986 other followers