Archive for the ‘Microsoft Research’ Category

Inverse Privacy

July 22, 2014 Comments off

Inverse Privacy
Source: Microsoft Research

We say that an item of your personal information is private if you have it but nobody else does. It is inversely private if somebody has it but you do not. We analyze the provenance of inverse privacy and argue that technology and appropriate public policy can reduce inverse privacy to a minimum.

About these ads

Using Ethical-Response Surveys to Identify Sources of Disapproval and Concern with Facebook’s Emotional Contagion Experiment and Other Controversial Studies

July 15, 2014 Comments off

Using Ethical-Response Surveys to Identify Sources of Disapproval and Concern with Facebook’s Emotional Contagion Experiment and Other Controversial Studies
Source: Microsoft Research

We surveyed 3570 workers on Amazon’s Mechanical Turk to gauge their ethical response to five scenarios describing scientific experiments—including one scenario describing Facebook’s emotional contagion experiment. We will post an update of this paper containing the results and analysis on or after 12:01AM Pacific on Monday July 14.

Circumlocution in Diagnostic Medical Queries

July 9, 2014 Comments off

Circumlocution in Diagnostic Medical Queries
Source: Microsoft Research

Circumlocution is when many words are used to describe what could be said with fewer, e.g., “a machine that takes moisture out of the air” instead of “dehumidifier”. Web search is a perfect backdrop for circumlocution where people struggle to name what they seek. In some domains, not knowing the correct term can have a significant impact on the search results that are retrieved. We study the medical domain, where professional medical terms are not commonly known and where the consequence of not knowing the correct term can impact the accuracy of surfaced information, as well as escalation of anxiety, and ultimately the medical care sought. Given a free-form colloquial health search query, our objective is to find the underlying professional medical term. The problem is complicated by the fact that people issue quite varied queries to describe what they have. Machine-learning algorithms can be brought to bear on the problem, but there are two key complexities: creating high-quality training data and identifying predictive features. To our knowledge, no prior work has been able to crack this important problem due to the lack of training data. We give novel solutions and demonstrate their efficacy via extensive experiments, greatly improving over the prior art.

Assigning Educational Videos at Appropriate Locations in Textbooks

July 8, 2014 Comments off

Assigning Educational Videos at Appropriate Locations in Textbooks
Source: Microsoft Research

The emergence of tablet devices, cloud computing, and abundant online multimedia content presents new opportunities to transform traditional paper-based textbooks into tablet-based electronic textbooks, and to further augment the educational experience by enriching them with relevant supplementary materials. The use of multimedia content such as educational videos along with textual content has been shown to improve learning outcomes. While such videos are becoming increasingly available, even a highly relevant video can be created at a granularity that may not mimic the organization of the textbook. We focus on the video assignment problem: Given a candidate set of relevant educational videos for augmenting an electronic textbook, how do we assign the videos at appropriate locations in the textbook? We propose a rigorous formulation of the video assignment problem and present an algorithm for assigning each video to the optimum subset of logical units. We also show that our objective function exhibits submodularity and hence admits an efficient greedy algorithm with provable quality guarantees, when the number of logical units is large. Our experimental evaluation using a diverse collection of educational videos relevant to multiple chapters in a textbook demonstrates the efficacy of the proposed techniques for inferring the granularity at which a relevant video should be assigned.

Analyze This! 145 Questions for Data Scientists in Software Engineering

June 18, 2014 Comments off

Analyze This! 145 Questions for Data Scientists in Software Engineering
Source: Microsoft Research

In this paper, we present the results from two surveys related to data science applied to software engineering. The first survey solicited questions that software engineers would like data scientists to investigate about software, about software processes and practices, and about software engineers. Our analyses resulted in a list of 145 questions grouped into 12 categories. The second survey asked a different pool of software engineers to rate these 145 questions and identify the most important ones to work on first. Respondents favored questions that focus on how customers typically use their applications. We also saw opposition to questions that assess the performance of individual employees or compare them with one another. Our categorization and catalog of 145 questions can help researchers, practitioners, and educators to more easily focus their efforts on topics that are important to the software industry.

Reflections on How Designers Design With Data

June 13, 2014 Comments off

Reflections on How Designers Design With Data
Source: Microsoft Research

In recent years many popular data visualizations have emerged that are created largely by designers whose main area of expertise is not computer science. Designers generate these visualizations using a handful of design tools and environments. To better inform the development of tools intended for designers working with data, we set out to understand designers’ challenges and perspectives. We interviewed professional designers, conducted observations of designers working with data in the lab, and observed designers working with data in team settings in the wild. A set of patterns emerged from these observations from which we extract a number of themes that provide a new perspective on design considerations for visualization tool creators, as well as on known engineering problems.

The Wisdom of Smaller, Smarter Crowds

June 11, 2014 Comments off

The Wisdom of Smaller, Smarter Crowds
Source: Microsoft Research

The “wisdom of crowds” refers to the phenomenon that aggregated predictions from a large group of people can rival or even beat the accuracy of experts. In domains with substantial stochastic elements, such as stock picking, crowd strategies (e.g. indexing) are difficult to beat. However, in domains in which some crowd members have demonstrably more skill than others, smart sub-crowds could possibly outperform the whole. The central question this work addresses is whether such smart subsets of a crowd can be identified a priori in a large-scale prediction contest that has substantial skill and luck components. We study this question with data obtained from fantasy soccer, a game in which millions of people choose professional players from the English Premier League to be on their fantasy soccer teams. The better the professional players do in real life games, the more points fantasy teams earn. Fantasy soccer is ideally suited to this investigation because it comprises millions of individual-level, within-subject predictions, past performance indicators, and the ability to test the effectiveness of arbitrary player-selection strategies. We find that smaller, smarter crowds can be identified in advance and that they beat the wisdom of the larger crowd. We also show that many players would do better by simply imitating the strategy of a player who has done well in the past. Finally, we provide a theoretical model that explains the results we see from our empirical analyses.

Mining Videos from the Web for Electronic Textbooks

June 9, 2014 Comments off

Mining Videos from the Web for Electronic Textbooks
Source: Microsoft Research

We propose a system for mining videos from the web for supplementing the content of electronic textbooks in order to enhance their utility. Textbooks are generally organized into sections such that each section explains very few concepts and every concept is primarily explained in one section. Building upon these principles from the education literature and drawing upon the theory of Formal Concept Analysis, we define the focus of a section in terms of a few indicia, which themselves are combinations of concept phrases uniquely present in the section. We identify videos relevant for a section by ensuring that at least one of the indicia for the section is present in the video and measuring the extent to which the video contains the concept phrases occurring in different indicia for the section. Our user study employing two corpora of textbooks on different subjects from two countries demonstrate that our system is able to find useful videos, relevant to individual sections.

Permacoin: Repurposing Bitcoin Work for Data Preservation

June 2, 2014 Comments off

Permacoin: Repurposing Bitcoin Work for Data Preservation
Source: Microsoft Research

Bitcoin is widely regarded as the first broadly successful e-cash system. An oft-cited concern, though, is that mining Bitcoins wastes computational resources. Indeed, Bitcoin’s underlying mining mechanism, which we call a scratch-off puzzle (SOP), involves continuously attempting to solve computational puzzles that have no intrinsic utility.

We propose a modification to Bitcoin that repurposes its mining resources to achieve a more broadly useful goal: distributed storage of archival data. We call our new scheme Permacoin. Unlike Bitcoin and its proposed alternatives, Permacoin requires clients to invest not just computational resources, but also storage. Our scheme involves an alternative scratch-off puzzle for Bitcoin based on Proofs-of-Retrievability (PORs). Successfully minting money with this SOP requires local, random access to a copy of a file. Given the competition among mining clients in Bitcoin, this modified SOP gives rise to highly decentralized file storage, thus reducing the overall waste of Bitcoin.

Using a model of rational economic agents we show that our modified SOP preserves the essential properties of the original Bitcoin puzzle. We also provide parameterizations and calculations based on realistic hardware constraints to demonstrate the practicality of Permacoin as a whole.

Online and social media data as a flawed continuous panel survey

May 24, 2014 Comments off

Online and social media data as a flawed continuous panel survey
Source: Microsoft Research

There is a large body of research on utilizing online activity to predict various real world outcomes, ranging from outbreaks of influenza to outcomes of elections. There is considerably less work, however, on using this data to understand topic-specific interest and opinion amongst the general population and specific demographic subgroups, as currently measured by relatively expensive surveys. Here we investigate this possibility by studying a full census of all Twitter activity during the 2012 election cycle along with comprehensive search history of a large panel of internet users during the same period, highlighting the challenges in interpreting online and social media activity as the results of a survey. As noted in existing work, the online population is a non-representative sample of the offline world (e.g., the U.S. voting population). We extend this work to show how demographic skew and user participation is non-stationary and unpredictable over time. In addition, the nature of user contributions varies wildly around important events. Finally, we note subtle problems in mapping what people are sharing or consuming online to specific sentiment or opinion measures around a particular topic. These issues must be addressed before meaningful insight about public interest and opinion can be reliably extracted from online and social media data.

Automatic Characterization of Speaking Styles in Educational Videos

May 21, 2014 Comments off

Automatic Characterization of Speaking Styles in Educational Videos
Source: Microsoft Research

Recent studies have shown the importance of using online videos along with textual material in educational instruction, especially for better content retention and improved concept understanding. A key question is how to select videos to maximize student engagement, particularly when there are multiple possible videos on the same topic. While there are many aspects that drive student engagement, in this paper we focus on presenter speaking styles in the video. We use crowd-sourcing to explore speaking style dimensions in online educational videos, and identify six broad dimensions: liveliness, speaking rate, pleasantness, clarity, formality and confidence. We then propose techniques based solely on acoustic features for automatically identifying a subset of the dimensions. Finally, we perform video re-ranking experiments to learn how users apply their speaking style preferences to augment textbook material. Our findings also indicate how certain dimensions are correlated with perceptions of general pleasantness of the voice.

Curation through use: Understanding the personal value of social media

May 19, 2014 Comments off

Curation through use: Understanding the personal value of social media
Source: Microsoft Research

Content generation on social network sites has been considered mainly from the perspective of individuals interacting with social network contacts. Yet research has also pointed to the potential for social media to become a meaningful personal archive over time. The aim of this paper is to consider how social media, over time and across sites, forms part of the wider digital archiving space for individuals. Our findings, from a qualitative study of 14 social media users, highlight how although some sites are more associated with ‘keepable’ social media than others, even those are not seen as archives in the usual sense of the word. We show how this perception is bound up with five contradictions, which center on social media as curated, as a reliable repository of meaningful content, as readily encountered and as having the potential to present content as a compelling narrative. We conclude by highlighting opportunities for design relating to curation through use and what this implies for personal digital archives, which are known to present difficulties in terms of curation and re-finding.

The Thin Lines between Data Analysis and Surveillance: Reflections on a Research History

May 13, 2014 Comments off

The Thin Lines between Data Analysis and Surveillance: Reflections on a Research History
Source: Microsoft Research

Where are the lines between ‘big data analytics’ and ‘surveillance’? As a researcher in the former—and an outspoken skeptic of the latter—I review my own research to examine how my own attempts to manage privacy in collecting and visualizing data have worked out. Interestingly, perhaps distressingly, it seems that even when projects are designed around viewing and displaying privacy-enhanced aggregates, it is easier to discuss them in terms of individual behavior and single subjects: a path that can lead distressingly toward accidentally building surveillance systems.

PopTherapy: Coping with Stress through Pop-Culture

May 12, 2014 Comments off

PopTherapy: Coping with Stress through Pop-Culture
Source: Microsoft Research

Stress is considered to be a modern day “global epidemic”; so given the widespread nature of this problem, it would be beneficial if solutions that help people to learn how to cope better with stress were scalable beyond what individual or group therapies can provide today. Therefore, in this work, we study the potential of smart-phones as a pervasive medium to provide “crowd therapy”. The work melds two novel contributions: first, a micro-intervention authoring process that focuses on repurposing popular web applications as stress management interventions; and second, a machine-learning based intervention recommender system that learns how to match interventions to individuals and their temporal circumstances over time. After four weeks, participants in our user study reported higher self-awareness of stress, lower depression-related symptoms and having learned new simple ways to deal with stress. Furthermore, participants receiving the machine-learning recommendations without option to select different ones showed a tendency towards using more constructive coping behaviors.

NewsPad: Designing for Collaborative Storytelling in Neighborhoods

April 28, 2014 Comments off

NewsPad: Designing for Collaborative Storytelling in Neighborhoods
Source: Microsoft Research

This paper introduces design explorations in neighborhood collaborative storytelling. We focus on blogs and citizen journalism, which have been celebrated as a means to meet the reporting needs of small local communities. These bloggers have limited capacity and social media feeds seldom have the context or readability of news stories. We present NewsPad, a content editor that helps communities create structured stories, collaborate in real time, recruit contributors, and syndicate the editing process. We evaluate NewsPad in four pilot deployments and find that the design elicits collaborative story creation.

Seeking and Sharing Health Information Online: Comparing Search Engines and Social Media

April 25, 2014 Comments off

Seeking and Sharing Health Information Online: Comparing Search Engines and Social Media
Source: Microsoft Research

Search engines and social media are two of the most commonly used online services; in this paper, we examine how users appropriate these platforms for online health activities via both large-scale log analysis and a survey of 210 people. While users often turn to search engines to learn about serious or highly stigmatic conditions, a surprising amount of sensitive health information is also sought and shared via social media, in our case the public social plat-form Twitter. We contrast what health content people seek via search engines vs. share on social media, as well as why they choose a particular platform for online health activities. We reflect on the implications of our results for designing search engines, social media, and social search tools that better support people’s health information seeking and sharing needs.

Searching for Myself: Motivations and Strategies for Self-Search

April 16, 2014 Comments off

Searching for Myself: Motivations and Strategies for Self-Search
Source: Microsoft Research

We present findings from a qualitative study of self-search, also known as ego or vanity search. In the context of a broader study about personal online content, participants were asked to search for themselves using their own computers and the browsers and queries they would normally adopt. Our analysis highlights five motivations for self-search: as a form of identity management; to discover reactions to and reuse of user-generated media; to re-find personal content; as a form of entertainment; and to reveal lost or forgotten content. Strategies vary according to motivation, and may differ markedly from typical information-seeking, with users looking deep into the results and using image search to identify content about themselves. We argue that two dimensions underpin ways of improving self-search: controllability and expectedness, and discuss what these dimensions imply for design.

Bored Tuesdays and Focused Afternoons: The Rhythm of Attention and Online Activity in the Workplace

April 11, 2014 Comments off

Bored Tuesdays and Focused Afternoons: The Rhythm of Attention and Online Activity in the Workplace
Source: Microsoft Research

While distractions due to digital media have received attention in HCI, we examine instead focused attention in the workplace. We logged digital activity and continually probed perspectives of 32 information workers for five days in situ to understand how attentional states change with context. We present a framework of how engagement and challenge in work relate to focus, bored, and rote work. Overall, we find more focused attention than boredom in the workplace. Reported focus peaks mid-afternoon while boredom is highest in the morning. People are happiest doing rote work; we show that focused work can involve stress. We identified higher levels of boredom mid-week. Online activities are associated with different attentional states, showing different patterns at beginning and end of day, and before and after a mid-day break. Our study shows how rhythms of attentional states are associated with context.

I am a Smartphone and I Know My User is Driving

January 23, 2014 Comments off

I am a Smartphone and I Know My User is Driving
Source: Microsoft Research

We intend to develop a smartphone app that can distinguish whether its user is a driver or a passenger in an automobile. While the core problem can be solved relatively easily with special installations in new high-end vehicles (e.g., NFC), constraints of backward compatibility makes the problem far more challenging. We design a Driver Detection System (DDS) that relies entirely on smartphone sensors, and is thereby compatible with all automobiles. Our approach harnesses smartphone sensors to recognize micro-activities in humans, that in turn discriminate between the driver and the passenger. We demonstrate an early prototype of this system on Android NexusS and Apple iPhones. Reported results show greater than 85% accuracy across 6 users in 2 different cars.

InterPoll: Crowd-Sourced Internet Polls (Done Right)

January 22, 2014 Comments off

InterPoll: Crowd-Sourced Internet Polls (Done Right)
Source: Microsoft Research

Crowd-sourcing is increasingly being used for providing answers to online polls and surveys. However, existing systems, while taking care of the mechanics of attracting crowd workers, poll building, and payment, provide little that would help the survey-maker or pollster to obtain statistically significant results devoid of even the obvious selection biases.

This paper proposes InterPoll, a platform for programming of crowd-sourced polls. Polls are expressed as embedded LINQ queries, whose results are provided to the developer. InterPoll supports reasoning about uncertainty, enabling t-tests, etc. on random variables obtained from the crowd. InterPoll performs query optimization, as well as bias correction and power analysis, among other features. Making InterPoll queries part of the surrounding program allows for optimizations that take advantage of the surrounding code context. The goal of InterPoll is to provide a system that can be reliably used for research into marketing, social and political science questions.

This paper highlights some of the existing challenges and how InterPoll is designed to address most of them. We outline some of the optimizations and give numerous motivating examples designed to illustrate our system design. Note that this paper is an outline of our vision—we deliberately focus on examples and motivation and leave a detailed technical treatment for future work.


Get every new post delivered to your Inbox.

Join 858 other followers