Archive for the ‘Microsoft Research’ Category

Searching for Myself: Motivations and Strategies for Self-Search

April 16, 2014 Comments off

Searching for Myself: Motivations and Strategies for Self-Search
Source: Microsoft Research

We present findings from a qualitative study of self-search, also known as ego or vanity search. In the context of a broader study about personal online content, participants were asked to search for themselves using their own computers and the browsers and queries they would normally adopt. Our analysis highlights five motivations for self-search: as a form of identity management; to discover reactions to and reuse of user-generated media; to re-find personal content; as a form of entertainment; and to reveal lost or forgotten content. Strategies vary according to motivation, and may differ markedly from typical information-seeking, with users looking deep into the results and using image search to identify content about themselves. We argue that two dimensions underpin ways of improving self-search: controllability and expectedness, and discuss what these dimensions imply for design.

About these ads

Bored Tuesdays and Focused Afternoons: The Rhythm of Attention and Online Activity in the Workplace

April 11, 2014 Comments off

Bored Tuesdays and Focused Afternoons: The Rhythm of Attention and Online Activity in the Workplace
Source: Microsoft Research

While distractions due to digital media have received attention in HCI, we examine instead focused attention in the workplace. We logged digital activity and continually probed perspectives of 32 information workers for five days in situ to understand how attentional states change with context. We present a framework of how engagement and challenge in work relate to focus, bored, and rote work. Overall, we find more focused attention than boredom in the workplace. Reported focus peaks mid-afternoon while boredom is highest in the morning. People are happiest doing rote work; we show that focused work can involve stress. We identified higher levels of boredom mid-week. Online activities are associated with different attentional states, showing different patterns at beginning and end of day, and before and after a mid-day break. Our study shows how rhythms of attentional states are associated with context.

I am a Smartphone and I Know My User is Driving

January 23, 2014 Comments off

I am a Smartphone and I Know My User is Driving
Source: Microsoft Research

We intend to develop a smartphone app that can distinguish whether its user is a driver or a passenger in an automobile. While the core problem can be solved relatively easily with special installations in new high-end vehicles (e.g., NFC), constraints of backward compatibility makes the problem far more challenging. We design a Driver Detection System (DDS) that relies entirely on smartphone sensors, and is thereby compatible with all automobiles. Our approach harnesses smartphone sensors to recognize micro-activities in humans, that in turn discriminate between the driver and the passenger. We demonstrate an early prototype of this system on Android NexusS and Apple iPhones. Reported results show greater than 85% accuracy across 6 users in 2 different cars.

InterPoll: Crowd-Sourced Internet Polls (Done Right)

January 22, 2014 Comments off

InterPoll: Crowd-Sourced Internet Polls (Done Right)
Source: Microsoft Research

Crowd-sourcing is increasingly being used for providing answers to online polls and surveys. However, existing systems, while taking care of the mechanics of attracting crowd workers, poll building, and payment, provide little that would help the survey-maker or pollster to obtain statistically significant results devoid of even the obvious selection biases.

This paper proposes InterPoll, a platform for programming of crowd-sourced polls. Polls are expressed as embedded LINQ queries, whose results are provided to the developer. InterPoll supports reasoning about uncertainty, enabling t-tests, etc. on random variables obtained from the crowd. InterPoll performs query optimization, as well as bias correction and power analysis, among other features. Making InterPoll queries part of the surrounding program allows for optimizations that take advantage of the surrounding code context. The goal of InterPoll is to provide a system that can be reliably used for research into marketing, social and political science questions.

This paper highlights some of the existing challenges and how InterPoll is designed to address most of them. We outline some of the optimizations and give numerous motivating examples designed to illustrate our system design. Note that this paper is an outline of our vision—we deliberately focus on examples and motivation and leave a detailed technical treatment for future work.

Mining Videos from the Web for Electronic Textbooks

January 17, 2014 Comments off

Mining Videos from the Web for Electronic Textbooks
Source: Microsoft Research

We propose a system for mining videos from the web for supplementing the content of electronic textbooks in order to enhance their utility. Textbooks are generally organized into sections such that each section explains very few concepts and every concept is primarily explained in one section. Building upon these principles from the education literature and drawing upon the theory of Formal Concept Analysis, we define the focus of a section in terms of a few indicia, which themselves are combinations of concept phrases uniquely present in the section. We identify videos relevant for a section by ensuring that at least one of the indicia for the section is present in the video and measuring the extent to which the video contains the concept phrases occurring in different indicia for the section. Our user study employing two corpora of textbooks on different subjects from two countries demonstrate that our system is able to find useful videos, relevant to individual sections.

Route Planning in Transportation Networks

January 15, 2014 Comments off

Route Planning in Transportation Networks
Source: Microsoft Research

We survey recent advances in algorithms for route planning in transportation networks. For road networks, we show that one can compute driving directions in milliseconds or less even at continental scale. A variety of techniques provide different trade-offs between preprocessing effort, space requirements, and query time. Some algorithms can answer queries in a fraction of a microsecond, while others can deal efficiently with real-time traffic. Journey planning on public transportation systems, although conceptually similar, is a significantly harder problem due to its inherent time-dependent and multicriteria nature. Although exact algorithms are fast enough for interactive queries on metropolitan transit systems, dealing with continent-sized instances requires approximations or simplifications. The multimodal route planning problem, which seeks journeys combining schedule-based transportation (buses, trains) with unrestricted modes (walking, driving), is even harder, relying on approximate solutions even for metropolitan inputs.

Balancing Burden and Benefit: Non-Prescribed Use of Employer-Issued Mobile Devices

January 7, 2014 Comments off

Balancing Burden and Benefit: Non-Prescribed Use of Employer-Issued Mobile Devices
Source: Microsoft Research

Mobile devices are increasingly powerful and flexible tools for computing and communication. When ICTD workers are given a mobile phone ‘for work’, what else do they do? And to what extent can or should an employer shape that use? This note presents research in progress, focused on rules that development projects impose to govern use of mobile devices. This work maps these rules against actual instrumental (work-related, non-prescribed) and non-instrumental (personal) device use, and enforcement of these rules, in eight projects using a popular mobile-based job aid, CommCare. We present early insights from qualitative analysis of two such deployments in India identifying a range of often conflicting policy choices that affect device use for project mission and/or professional and personal empowerment. We explore tradeoffs for morale, work quality, mission, and device integrity. We identify user remote availability, soft intimidation, and validation as mechanisms to shift authority and credibility of information sources. The implications of our findings are increasingly important as governments and NGOs arm frontline workers with mobile devices as tools to improve service delivery.

Duplicate News Story Detection Revisited

December 24, 2013 Comments off

Duplicate News Story Detection Revisited
Source: Microsoft Research

In this paper, we investigate near-duplicate detection, particularly looking at the detection of evolving news stories. These stories often consist primarily of syndicated information, with local replacement of headlines, captions, and the addition of locally-relevant content. By detecting near-duplicates, we can offer users only those stories with content materially different from previously-viewed versions of the story. We expand on previous work and improve the performance of near-duplicate document detection by weighting the phrases in a sliding window based on the term frequency within the document of terms in that window and inverse document frequency of those phrases. We experiment on a subset of a publicly available web collection that is comprised solely of documents from news web sites.

News articles are particularly challenging due to the prevalence of syndicated articles, where very similar articles are run with different headlines and surrounded by different HTML markup and site templates. We evaluate these algorithmic weightings using human judgments to determine similarity. We find that our techniques outperform the state of the art with statistical significance and are more discriminating when faced with a diverse collection of documents.

Remote Shopping Advice: Enhancing In-Store Shopping with Social Technologies

November 26, 2013 Comments off

Remote Shopping Advice: Enhancing In-Store Shopping with Social Technologies (PDF)
Source: Microsoft Research

Consumers shopping in “brick-and-mortar” (non-virtual) stores often use their mobile phones to consult with others about potential purchases. Via a survey (n = 200), we detail current practices in seeking remote shopping advice. We then consider how emerging social platforms, such as social networking sites and crowd labor markets, could offer rich next-generation remote shopping advice experiences. We conducted a field experiment in which shoppers shared photographs of potential purchases via MMS, Facebook, and Mechanical Turk. Paid crowdsourcing, in particular, proved surprisingly useful and influential as a means of augmenting in-store shopping. Based on our findings, we offer design suggestions for next-generation remote shopping advice systems.

Give in to Procrastination and Stop Prefetching

November 25, 2013 Comments off

Give in to Procrastination and Stop Prefetching
Source: Microsoft Research

Generations of computer programmers are taught to prefetch network objects in computer science classes. In practice, prefetching can be harmful to the user’s wallet when she is on a limited or pay-per-byte cellular data plan. Many popular, professionally-written smartphone apps today prefetch large amounts of network data that the typical user may never use. We present Procrastinator, which automatically decides when to fetch each network object that an app requests. This decision is made based on whether the user is on Wi-Fi or cellular, how many bytes are remaining on the user’s data plan, and whether the object is needed at the present time. Procrastinator does not require developer effort, nor app source code, nor OS changes — it modifies the app binary to trap specific system calls and inject custom code. Our system can achieve as little as no savings to 4X savings in bytes transferred, depending on the user and the app. In theory, we can achieve 17X savings, but we need to overcome additional technical challenges.

Mining Large-scale TV Group Viewing Patterns for Group Recommendation

November 21, 2013 Comments off

Mining Large-scale TV Group Viewing Patterns for Group Recommendation
Source: Microsoft Research

We present a large-scale study of television viewing habits, focusing on how individuals adapt their preferences when consuming content in group settings. While there has been a great deal of recent work on modeling individual preferences, there has been considerably less work studying the behavior and preferences of groups, due mostly to the difficulty of data collection in these settings. In contrast to past work that has relied either on small-scale surveys or prototypes, we explore more than 4 million logged views paired with individual-level demographic and co-viewing information to uncover variation in the viewing patterns of individuals and groups. Our analysis reveals which genres are popular among specific demographic groups when viewed individually, how often individuals from different demographic categories participate in group viewing, and how viewing patterns change in various group contexts. Furthermore, we leverage this large-scale dataset to directly estimate how individual preferences are combined in group settings, fifinding subtle deviations from traditional preference aggregation functions. We present a simple model which captures these effects and discuss the impact of these findings on the design of group recommendation systems.

Interactive Genomics: Rapidly Querying Genomes in the Cloud

November 20, 2013 Comments off

Interactive Genomics: Rapidly Querying Genomes in the Cloud
Source: Microsoft Research

Genome sequence data is now “Big Data” in both volume and velocity. Joined with medical records, genome data can be mined for insights for treating disease. Genomics today is dominated by batch processing: simple analytical questions take days to answer. We propose instead that genomics be made interactive so that queries on a large genome database in the cloud are answered across the network in seconds. Towards this vision, we introduce a query language, Genome Query Language (GQL), in which intervals are first class, and joins are based on intersection not equality. GQL can be used to query for large structural variations on the TCGA cancer archive using only 5-10 lines of high level code that takes around 60 seconds to execute in the Azure cloud on an input BAM file of 83 GB. GQL results can be incrementally deployed both on the UCSC browser and by refactoring an existing variant caller to provide 6x speedup. Our paper focuses on the system design and five key optimizations — cached parsing, lazy joins, materialized views and chromosomal parallelism — that speed up query processing by 100x. We also reflect on 3 years of experience designing and using GQL.

Nature of Information, People, and Relationships in Digital Social Networks

November 7, 2013 Comments off

Nature of Information, People, and Relationships in Digital Social Networks
Source: Microsoft Research

This paper summarizes the results of our recent investigations into how information propagates, how people assimilate information, and how people form relationships to gain information in Internet-centric social settings. It includes key ideas related to the role of the nature of information items in information diffusion as well as the notion of receptivity on part of the receiver and how it affects information assimilation and opinion formation. It describes a system that incorporates availability, willingness, and knowledge in recommending friends to a person seeking advice from social network. It discusses whether having common interests makes it more likely for a pair of users to be friends and whether being friends influences the likelihood of having common interests, and quantifies the influence of various factors in an individual’s continued relationship with a social group. Finally, it gives current research directions related to privacy and social analytics.

Analyze This! 145 Questions for Data Scientists in Software Engineering

October 30, 2013 Comments off

Analyze This! 145 Questions for Data Scientists in Software Engineering
Source: Microsoft Research

In this paper, we present the results from two surveys related to data science applied to software engineering. The first survey solicited questions that software engineers would like to ask data scientists to investigate about software, software processes and practices, and about software engineers. Our analysis resulted in a list of 145 questions grouped into 12 categories. The second survey asked a different pool of software engineers to rate the 145 questions and identify the most important ones to work on first. Respondents favored questions that focus on how customers typically use their applications. We also see opposition to questions that assess the performance of individual employees or compare them to one another. Our categorization and catalog of 145 questions will help researchers, practitioners, and educators to more easily focus their efforts on topics that are important to the software industry.

Beyond Clicks: Query Reformulation as a Predictor of Search Satisfaction

October 24, 2013 Comments off

Beyond Clicks: Query Reformulation as a Predictor of Search Satisfaction
Source: Microsoft Research

To understand whether a user is satisfied with the current search results, implicit behavior is a useful data source, with clicks being the best-known implicit signal. However, it is possible for a non-clicking user to be satisfied and a clicking user to be dissatisfied. Here we study additional implicit signals based on the relationship between the user’s current query and the next query, such as their textual similarity and the inter-query time. Using a large unlabeled dataset, a labeled dataset of queries and a labeled dataset of user tasks, we analyze the relationship between these signals. We identify an easily-implemented rule that indicates dissatisfaction: that a similar query issued within a time interval that is short enough (such as five minutes) implies dissatisfaction. By incorporating additional query-based features in the model, we show that a query-based model (with no click information) can indicate satisfaction more accurately than click-based models. The best model uses both query and click features. In addition, by comparing query sequences in successful tasks and unsuccessful tasks, we observe that search success is an incremental process for successful tasks with multiple queries.

See also: Personalized Models of Search Satisfaction
See also: Identifying Web Search Query Reformulation using Concept based Matching

The wireless data drain of users, apps, & platforms

October 16, 2013 Comments off

The wireless data drain of users, apps, & platforms
Source: Microsoft Research

Cellular data consumption is an important issue for users and network operators. However, little is understood about data consumption differences between similar apps, smartphone platforms, and different classes of users. We examine data consumption behavior in the lab, comparing different apps of the same category, comparing the same top apps across different platforms, and comparing network APIs that apps use across different platforms. We also collect data from 387 Android users in India, where users pay for cellular data consumed, with little prevalence of unlimited data plans. Our findings can inform users on how their choice of platform and apps has a drastic impact on their data bill. Our findings can also inform operators on how to use incentives to induce desired data consumption.

Studying from Electronic Textbooks

October 7, 2013 Comments off

Studying from Electronic Textbooks
Source: Microsoft Research

We present study navigator, an algorithmically-generated aid for enhancing the experience of studying from electronic textbooks. The study navigator for a section of the book consists of helpful concept references for understanding this section. Each concept reference is a pair consisting of a concept phrase explained elsewhere and the link to the section in which it has been explained. We propose a novel reader model for textbooks and an algorithm for generating the study navigator based on this model. We also present the results of an extensive user study that demonstrates the efficacy of the proposed system across textbooks on different subjects from different grades.

Are Some Tweets More Interesting Than Others? #HardQuestion

October 3, 2013 Comments off

Are Some Tweets More Interesting Than Others? #HardQuestion
Source: Microsoft Research

Twitter has evolved into a significant communication nexus, coupling personal and highly contextual utterances with local news, memes, celebrity gossip, headlines, and other microblogging subgenres. If we take Twitter as a large and varied dynamic collection, how can we predict which tweets will be interesting to a broad audience in advance of lagging social indicators of interest such as retweets? The telegraphic form of tweets, coupled with the subjective notion of interestingness, makes it difficult for human judges to agree on which tweets are indeed interesting.

In this paper, we address two questions: Can we develop a reliable strategy that results in high-quality labels for a collection of tweets, and can we use this labeled collection to predict a tweet’s interestingness? To answer the first question, we performed a series of studies using crowdsourcing to reach a diverse set of workers who served as a proxy for an audience with variable interests and perspectives. This method allowed us to explore different labeling strategies, including varying the judges, the labels they applied, the datasets, and other aspects of the task. To address the second question, we used crowdsourcing to assemble a set of tweets rated as interesting or not; we scored these tweets using textual and contextual features; and we used these scores as inputs to a binary classifier. We were able to achieve moderate agreement (kappa = 0.52) between the best classifier and the human assessments, a figure which reflects the challenges of the judgment task.

Towards a Holistic Data Center Simulator

September 19, 2013 Comments off

Towards a Holistic Data Center Simulator
Source: Microsoft Research

Data center (DC) design has become increasingly important with the rapid growth of cloud computing and online services. The rapid growth rate makes them a significant consumer on the energy grid. Differences in environmental operating conditions, energy price and availability, network bandwidth and latency, as well as unpredictable user demand pose significant challenges for determining the right size, density, and energy sources for data centers. Data from real data centers is often proprietary and severely limits academia and research institutions from addressing these challenges. Building a data center testbed for research is not only cost prohibitive (e.g., a 1 MW datacenter costs approximately $10 Million- $22 Million [1]) but is also difficult to continually upgrade or explore diversified technologies and industry practices.

Existing modeling, design methodologies and tools are not capable of capturing the scale and heterogeneity in complex systems like data centers. To effectively model performance, energy consumption, energy technologies, network, server trends, failure recovery, and varied operational scenarios, we propose coordinated research efforts to build a DC level full system modeling and simulation platform that enables researchers to investigate multiple DC design aspects for energy and resource efficiency.

Sensing the Pulse of Urban Refueling Behavior

September 5, 2013 Comments off

Sensing the Pulse of Urban Refueling Behavior
Source: Microsoft Research

Urban transportation is increasingly studied due to its complexity and economic importance. It is also a major component of urban energy use and pollution. The importance of this topic will only increase as urbanization continues around the world. A less researched aspect of transporta-tion is the refueling behavior of drivers. In this paper, we propose a step toward real-time sensing of refueling behavior and citywide petrol consumption. We use reported trajectories from a fleet of GPS-equipped taxicabs to detect gas station visits, measure the time spent, and estimate overall demand. For times and stations with sparse data, we use collaborative filtering to estimate conditions. Our system provides real-time estimates of gas stations’ waiting times, from which recommendations could be made, an indicator of overall gas usage, from which macro-scale economic decisions could be made, and a geographic view of the efficiency of gas station placement.


Get every new post delivered to your Inbox.

Join 785 other followers