Archive for the ‘Microsoft Research’ Category

Structured Information Extraction from Natural Disaster Events on Twitter

September 29, 2014 Comments off

Structured Information Extraction from Natural Disaster Events on Twitter
Source: Microsoft Research

As soon as natural disaster events happen, users are eager to know more about them. However, search engines currently provide a ten blue links interface for queries related to such events. Relevance of results for such queries can be significantly improved if users are shown a structured summary of the fresh events related to such queries. This would not just reduce the number of user clicks to get the relevant information but would also help users get updated with more fine grained attribute-level information.

Twitter is a great source that can be exploited for obtaining such fine-grained structured information for fresh natural disaster events. Such events are often reported on Twitter much earlier than on other news media. However, extracting such structured information from tweets is challenging because: 1. tweets are noisy and ambiguous; 2. there is no well defined schema for various types of natural disaster events; 3. it is not trivial to extract attribute-value pairs and facts from unstructured text; and 4. it is difficult to find good mappings between extracted attributes and attributes in the event schema.

We propose algorithms to extract attribute-value pairs, and also devise novel mechanisms to map such pairs to manually generated schemas for natural disaster events. Besides the tweet text, we also leverage text from URL links in the tweets to fill such schemas. Our schemas are temporal in nature and the values are updated whenever fresh information flows in from human sensors on Twitter. Evaluation on ∼58000 tweets for 20 events shows that our system can fill such event schemas with an F1 of ∼0.6.

About these ads

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

September 26, 2014 Comments off

Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
Source: Microsoft Research

We aim to provide table answers to keyword queries using a knowledge base. For queries referring to multiple entities, like “Washington cities population” and “Mel Gibson movies”, it is better to represent each relevant answer as a table which aggregates a set of entities or joins of entities within the same table scheme or pattern. In this paper, we study how to find highly relevant patterns in a knowledge base for user-given keyword queries to compose table answers. A knowledge base is modeled as a directed graph called knowledge graph, where nodes represent its entities and edges represent the relationships among them. Each node/edge is labeled with type and text. A pattern is an aggregation of subtrees which contain all keywords in the texts and have the same structure and types on node/edges. We propose efficient algorithms to find patterns that are relevant to the query for a class of scoring functions. We show the hardness of the problem in theory, and propose pathbased indexes that are affordable in memory. Two query-processing algorithms are proposed: one is fast in practice for small queries (with small numbers of patterns as answers) by utilizing the indexes; and the other one is better in theory, with running time linear in the sizes of indexes and answers, which can handle large queries better. We also conduct extensive experimental study to compare our approaches with a naive adaption of known techniques.

Enabling Physical Analytics in Retail Stores Using Smart Glasses

September 9, 2014 Comments off

Enabling Physical Analytics in Retail Stores Using Smart Glasses
Source: Microsoft Research

We consider the problem of tracking physical browsing by users in indoor spaces such as retail stores. Analogous to online browsing, where users choose to go to certain webpages, dwell on a subset of pages of interest to them, and click on links of interest while ignoring others, we can draw parallels in the physical setting, where a user might “walk” purposefully to a section of interest, “dwell” there for a while, “gaze” at specific items, and “reach out” for the ones that they wish to examine more closely and possibly purchase.

As our first contribution, we design techniques to track each of these elements of physical browsing using a combination of a first-person vision enabled by smart glasses, and inertial sensing using both the glasses and a smartphone. We address key challenges, including energy efficiency by using the less expensive inertial sensors. Second, during gazing, we present a method for identifying the item(s) within view that the user is likely to focus on based on measuring the orientation of the user’s head.

Finally, unlike in the online context, where every webpage is just a click away, proximity is important in the physical browsing setting. To enable the tracking of nearby items, even if outside the field of view, we use data gathered from smart-glasses-enabled users to infer the product layout using a novel technique called AutoLayout. Further, we show how such inferences made from a small population of smart-glasses-enabled users could aid in tracking the physical browsing by the many smartphone-only users.

Password Portfolios and the Finite-Effort User: Sustainably Managing Large Numbers of Accounts

August 7, 2014 Comments off

Password Portfolios and the Finite-Effort User: Sustainably Managing Large Numbers of Accounts
Source: Microsoft Research

We explore how to manage a portfolio of passwords. We review why mandating exclusively strong passwords with no re-use gives users an impossible task as portfolio size grows. We find that approaches justified by loss-minimization alone, and those that ignore important attack vectors (e.g., vectors exploiting re-use), are amenable to analysis but unrealistic. In contrast, we propose, model and analyze portfolio management under a realistic attack suite, with an objective function costing both loss and user effort. Our findings directly challenge accepted wisdom and conventional advice. We find, for example, that a portfolio strategy ruling out weak passwords or password re-use is sub-optimal. We give an optimal solution for how to group accounts for re-use, and model-based principles for portfolio management.

Inverse Privacy

July 22, 2014 Comments off

Inverse Privacy
Source: Microsoft Research

We say that an item of your personal information is private if you have it but nobody else does. It is inversely private if somebody has it but you do not. We analyze the provenance of inverse privacy and argue that technology and appropriate public policy can reduce inverse privacy to a minimum.

Using Ethical-Response Surveys to Identify Sources of Disapproval and Concern with Facebook’s Emotional Contagion Experiment and Other Controversial Studies

July 15, 2014 Comments off

Using Ethical-Response Surveys to Identify Sources of Disapproval and Concern with Facebook’s Emotional Contagion Experiment and Other Controversial Studies
Source: Microsoft Research

We surveyed 3570 workers on Amazon’s Mechanical Turk to gauge their ethical response to five scenarios describing scientific experiments—including one scenario describing Facebook’s emotional contagion experiment. We will post an update of this paper containing the results and analysis on or after 12:01AM Pacific on Monday July 14.

Circumlocution in Diagnostic Medical Queries

July 9, 2014 Comments off

Circumlocution in Diagnostic Medical Queries
Source: Microsoft Research

Circumlocution is when many words are used to describe what could be said with fewer, e.g., “a machine that takes moisture out of the air” instead of “dehumidifier”. Web search is a perfect backdrop for circumlocution where people struggle to name what they seek. In some domains, not knowing the correct term can have a significant impact on the search results that are retrieved. We study the medical domain, where professional medical terms are not commonly known and where the consequence of not knowing the correct term can impact the accuracy of surfaced information, as well as escalation of anxiety, and ultimately the medical care sought. Given a free-form colloquial health search query, our objective is to find the underlying professional medical term. The problem is complicated by the fact that people issue quite varied queries to describe what they have. Machine-learning algorithms can be brought to bear on the problem, but there are two key complexities: creating high-quality training data and identifying predictive features. To our knowledge, no prior work has been able to crack this important problem due to the lack of training data. We give novel solutions and demonstrate their efficacy via extensive experiments, greatly improving over the prior art.


Get every new post delivered to your Inbox.

Join 945 other followers