Archive

Archive for the ‘Microsoft Research’ Category

Information Retrieval with Verbose Queries

May 23, 2015 Comments off

Information Retrieval with Verbose Queries
Source: Microsoft Research

Recently, the focus of many novel search applications shifted from short keyword queries to verbose natural language queries. Examples include question answering systems and dialogue systems, voice search on mobile devices and entity search engines like Facebook’s Graph Search or Google’s Knowledge Graph. However the performance of textbook information retrieval techniques for such verbose queries is not as good as that for their shorter counterparts. Thus, effective handling of verbose queries has become a critical factor for adoption of information retrieval techniques in this new breed of search applications. Over the past decade, the information retrieval community has deeply explored the problem of transforming natural language verbose queries using operations like reduction, weighting, expansion, reformulation and segmentation into more effective structural representations. However, thus far, there was not a coherent and organized tutorial on this topic. In this tutorial, we aim to put together various research pieces of the puzzle, provide a comprehensive and structured overview of various proposed methods, and also list various application scenarios where effective verbose query processing can make a significant difference.

The Emerging Role of Data Scientists on Software Development Teams

April 20, 2015 Comments off

The Emerging Role of Data Scientists on Software Development Teams
Source: Microsoft Research

Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their raison d’être in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists and describe a set of strategies that they employ to increase the impact and actionability of their work.

Navigating Controversy as a Complex Search Task

March 31, 2015 Comments off

Navigating Controversy as a Complex Search Task
Source: Microsoft Research

Seeking information on a controversial topic is often a complex task, for both the user and the search engine. There are multiple subtleties involved with information seeking on controversial topics. Here we discuss some of the challenges in addressing these complex tasks, describing the spectrum between cases where there is a clear right answer, through fact disputes and moral debates, and discuss cases where search queries have a measurable effect on the well-being of people. We briefly survey the current state of the art, and the many open questions remaining, including both technical challenges and the possible ethical implications for search engine algorithms.

Learning about health and medicine from Internet data

March 18, 2015 Comments off

Learning about health and medicine from Internet data
Source: Microsoft Research

Surveys show that around 70% of US Internet users consult the Internet when they require medical information. People seek this information using both traditional search engines and via social media. The information created using the search process offers an unprecedented opportunity for applications to monitor and improve the quality of life of people with a variety of medical conditions. In recent years, research in this area has addressed public-health questions such as the effect of media on development of anorexia, developed tools for measuring influenza rates and assessing drug safety, and examined the effects of health information on individual wellbeing. This tutorial will show how Internet data can facilitate medical research, providing an overview of the state-of-the-art in this area. During the tutorial we will discuss the information which can be gleaned from a variety of Internet data sources, including social media, search engines, and specialized medical websites. We will provide an overview of analysis methods used in recent literature, and show how results can be evaluated using publicly-available health information and online experimentation. Finally, we will discuss ethical and privacy issues and possible technological solutions. This tutorial is intended for researchers of user generated content who are interested in applying their knowledge to improve health and medicine.

Accessible Crowdwork? Understanding the Value in and Challenge of Microtask Employment for People with Disabilities

March 2, 2015 Comments off

Accessible Crowdwork? Understanding the Value in and Challenge of Microtask Employment for People with Disabilities
Source: Microsoft Research

We present the first formal study of crowdworkers who have disabilities via in-depth open-ended interviews of 17 people (disabled crowdworkers and job coaches for people with disabilities) and a survey of 631 adults with disabilities. Our findings establish that people with a variety of disabilities currently participate in the crowd labor marketplace, despite challenges such as crowdsourcing workflow designs that inadvertently prohibit participation by, and may negatively affect the worker reputations of, people with disabilities. Despite such challenges, we find that crowdwork potentially offers different opportunities for people with disabilities relative to the normative office environment, such as job flexibility and lack of a need to rely on public transit. We close by identifying several ways in which crowd labor platform operators and/or individual task requestors could improve the accessibility of this increasingly important form of employment.

Supercomputers: The Amazing Race

January 13, 2015 Comments off

Supercomputers: The Amazing Race
Source: Microsoft Research (Gordon Bell)

The “ideal supercomputer” has an infinitely fast clock, executes a single instruction stream program operating on data stored in an infinitely large, and fast single-memory. Backus established the von Neumann programming model with FORTRAN. Supercomputers have evolved in steps: increasing processor speed, processing vectors, adding processors for a program held in a single memory monocomputer; and interconnecting multiple computers over which a distributed program runs in parallel. Thus, supercomputing has evolved from a hardware engineering design challenge of the Cray Era(1960-1995) of the monocomputer to the challenging of creating programs that operate on distributed (mono)computers of the Multicomputer Era (1985- present).

Identifying Presentation Styles in Online Educational Videos

January 6, 2015 Comments off

Identifying Presentation Styles in Online Educational Videos
Source: Microsoft Research

The rapid growth of online educational videos has resulted in huge redundancy. The same underlying content is often available in multiple videos with varying quality, presenter, and presentation style (slide show, whiteboard presentation, demo, etc). The fact that there are so many videos on the same content makes it important to retrieve videos that are attuned to user preferences. While there are several aspects that drive user engagement, we focus on the presentation style of the video. Based on a large scale manual study, we identify the 11 dominant presentation styles that typically employed. We propose a reference algorithm combining a set of 3-Way Decision Forests with probabilistic fusion and using a large set of image, face and motion features. We analyze our empirical results to provide understanding of the difficulties of the problem and to highlight directions for future research on this new application. We also make the data available.

Follow

Get every new post delivered to your Inbox.

Join 1,053 other followers