Password Portfolios and the Finite-Effort User: Sustainably Managing Large Numbers of Accounts

August 7, 2014 Comments off

Source: Microsoft Research

We explore how to manage a portfolio of passwords. We review why mandating exclusively strong passwords with no re-use gives users an impossible task as portfolio size grows. We find that approaches justified by loss-minimization alone, and those that ignore important attack vectors (e.g., vectors exploiting re-use), are amenable to analysis but unrealistic. In contrast, we propose, model and analyze portfolio management under a realistic attack suite, with an objective function costing both loss and user effort. Our findings directly challenge accepted wisdom and conventional advice. We find, for example, that a portfolio strategy ruling out weak passwords or password re-use is sub-optimal. We give an optimal solution for how to group accounts for re-use, and model-based principles for portfolio management.

Inverse Privacy

July 22, 2014 Comments off

Source: Microsoft Research

We say that an item of your personal information is private if you have it but nobody else does. It is inversely private if somebody has it but you do not. We analyze the provenance of inverse privacy and argue that technology and appropriate public policy can reduce inverse privacy to a minimum.

Using Ethical-Response Surveys to Identify Sources of Disapproval and Concern with Facebook’s Emotional Contagion Experiment and Other Controversial Studies

July 15, 2014 Comments off

Source: Microsoft Research

We surveyed 3570 workers on Amazon’s Mechanical Turk to gauge their ethical response to five scenarios describing scientific experiments—including one scenario describing Facebook’s emotional contagion experiment. We will post an update of this paper containing the results and analysis on or after 12:01AM Pacific on Monday July 14.

Circumlocution in Diagnostic Medical Queries

July 9, 2014 Comments off

Source: Microsoft Research

Circumlocution is when many words are used to describe what could be said with fewer, e.g., “a machine that takes moisture out of the air” instead of “dehumidifier”. Web search is a perfect backdrop for circumlocution where people struggle to name what they seek. In some domains, not knowing the correct term can have a significant impact on the search results that are retrieved. We study the medical domain, where professional medical terms are not commonly known and where the consequence of not knowing the correct term can impact the accuracy of surfaced information, as well as escalation of anxiety, and ultimately the medical care sought. Given a free-form colloquial health search query, our objective is to find the underlying professional medical term. The problem is complicated by the fact that people issue quite varied queries to describe what they have. Machine-learning algorithms can be brought to bear on the problem, but there are two key complexities: creating high-quality training data and identifying predictive features. To our knowledge, no prior work has been able to crack this important problem due to the lack of training data. We give novel solutions and demonstrate their efficacy via extensive experiments, greatly improving over the prior art.

Assigning Educational Videos at Appropriate Locations in Textbooks

July 8, 2014 Comments off

Source: Microsoft Research

The emergence of tablet devices, cloud computing, and abundant online multimedia content presents new opportunities to transform traditional paper-based textbooks into tablet-based electronic textbooks, and to further augment the educational experience by enriching them with relevant supplementary materials. The use of multimedia content such as educational videos along with textual content has been shown to improve learning outcomes. While such videos are becoming increasingly available, even a highly relevant video can be created at a granularity that may not mimic the organization of the textbook. We focus on the video assignment problem: Given a candidate set of relevant educational videos for augmenting an electronic textbook, how do we assign the videos at appropriate locations in the textbook? We propose a rigorous formulation of the video assignment problem and present an algorithm for assigning each video to the optimum subset of logical units. We also show that our objective function exhibits submodularity and hence admits an efficient greedy algorithm with provable quality guarantees, when the number of logical units is large. Our experimental evaluation using a diverse collection of educational videos relevant to multiple chapters in a textbook demonstrates the efficacy of the proposed techniques for inferring the granularity at which a relevant video should be assigned.

Analyze This! 145 Questions for Data Scientists in Software Engineering

June 18, 2014 Comments off

Source: Microsoft Research

In this paper, we present the results from two surveys related to data science applied to software engineering. The first survey solicited questions that software engineers would like data scientists to investigate about software, about software processes and practices, and about software engineers. Our analyses resulted in a list of 145 questions grouped into 12 categories. The second survey asked a different pool of software engineers to rate these 145 questions and identify the most important ones to work on first. Respondents favored questions that focus on how customers typically use their applications. We also saw opposition to questions that assess the performance of individual employees or compare them with one another. Our categorization and catalog of 145 questions can help researchers, practitioners, and educators to more easily focus their efforts on topics that are important to the software industry.

Reflections on How Designers Design With Data

June 13, 2014 Comments off

Source: Microsoft Research

In recent years many popular data visualizations have emerged that are created largely by designers whose main area of expertise is not computer science. Designers generate these visualizations using a handful of design tools and environments. To better inform the development of tools intended for designers working with data, we set out to understand designers’ challenges and perspectives. We interviewed professional designers, conducted observations of designers working with data in the lab, and observed designers working with data in team settings in the wild. A set of patterns emerged from these observations from which we extract a number of themes that provide a new perspective on design considerations for visualization tool creators, as well as on known engineering problems.


