Archive for the ‘’ Category

The Shortest Path to Happiness: Recommending Beautiful, Quiet, and Happy Routes in the City

July 14, 2014 Comments off

The Shortest Path to Happiness: Recommending Beautiful, Quiet, and Happy Routes in the City

When providing directions to a place, web and mobile mapping services are all able to suggest the shortest route. The goal of this work is to automatically suggest routes that are not only short but also emotionally pleasant. To quantify the extent to which urban locations are pleasant, we use data from a crowd-sourcing platform that shows two street scenes in London (out of hundreds), and a user votes on which one looks more beautiful, quiet, and happy. We consider votes from more than 3.3K individuals and translate them into quantitative measures of location perceptions. We arrange those locations into a graph upon which we learn pleasant routes. Based on a quantitative validation, we find that, compared to the shortest routes, the recommended ones add just a few extra walking minutes and are indeed perceived to be more beautiful, quiet, and happy. To test the generality of our approach, we consider Flickr metadata of more than 3.7M pictures in London and 1.3M in Boston, compute proxies for the crowdsourced beauty dimension (the one for which we have collected the most votes), and evaluate those proxies with 30 participants in London and 54 in Boston. These participants have not only rated our recommendations but have also carefully motivated their choices, providing insights for future work.

Hat tip: ResearchBuzz

About these ads

Privacy and Security in the Genomic Era

May 15, 2014 Comments off

Privacy and Security in the Genomic Era (PDF)

Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.

Social Media — Can Cascades be Predicted?

April 7, 2014 Comments off

Can Cascades be Predicted?

On many social networking web sites such as Facebook and Twitter, resharing or reposting functionality allows users to share others’ content with their own friends or followers. As content is reshared from user to user, large cascades of reshares can form. While a growing body of research has focused on analyzing and characterizing such cascades, a recent, parallel line of work has argued that the future trajectory of a cascade may be inherently unpredictable. In this work, we develop a framework for addressing cascade prediction problems. On a large sample of photo reshare cascades on Facebook, we find strong performance in predicting whether a cascade will continue to grow in the future. We find that the relative growth of a cascade becomes more predictable as we observe more of its reshares, that temporal and structural features are key predictors of cascade size, and that initially, breadth, rather than depth in a cascade is a better indicator of larger cascades. This prediction performance is robust in the sense that multiple distinct classes of features all achieve similar performance. We also discover that temporal features are predictive of a cascade’s eventual shape. Observing independent cascades of the same content, we find that while these cascades differ greatly in size, we are still able to predict which ends up the largest.

Home Location Identification of Twitter Users

March 27, 2014 Comments off

Home Location Identification of Twitter Users

We present a new algorithm for inferring the home location of Twitter users at different granularities, including city, state, time zone or geographic region, using the content of users tweets and their tweeting behavior. Unlike existing approaches, our algorithm uses an ensemble of statistical and heuristic classifiers to predict locations and makes use of a geographic gazetteer dictionary to identify place-name entities. We find that a hierarchical classification approach, where time zone, state or geographic region is predicted first and city is predicted next, can improve prediction accuracy. We have also analyzed movement variations of Twitter users, built a classifier to predict whether a user was travelling in a certain period of time and use that to further improve the location detection accuracy. Experimental evidence suggests that our algorithm works well in practice and outperforms the best existing algorithms for predicting the home location of Twitter users.

Hat tip: ResearchBuzz

Statistical Signs of Social Influence on Suicides

February 24, 2014 Comments off

Statistical Signs of Social Influence on Suicides

Certain currents in sociology consider society as being composed of autonomous individuals with independent psychologies. Others, however, deem our actions as strongly influenced by the accepted standards of social behavior. The later view was central to the positivist conception of society when in 1887 \’Emile Durkheim published his monograph Suicide (Durkheim, 1897). By treating the suicide as a social fact, Durkheim envisaged that suicide rates should be determined by the connections (or the lack of them) between people and society. Under the same framework, Durkheim considered that crime is bound up with the fundamental conditions of all social life and serves a social function. In this sense, and regardless of its extremely deviant nature, crime events are somehow capable to release certain social tensions and so have a purging effect in society. The social effect on the occurrence of homicides has been previously substantiated (Bettencourt et al., 2007; Alves et al., 2013), and confirmed here, in terms of a superlinear scaling relation: by doubling the population of a Brazilian city results in an average increment of 135 % in the number of homicides, rather than the expected isometric increase of 100 %, as found, for example, for the mortality due to car crashes. Here we present statistical signs of the social influence on the suicide occurrence in cities. Differently from homicides (superlinear) and fatal events in car crashes (isometric), we find sublinear scaling behavior between the number of suicides and city population, with allometric power-law exponents, β=0.836±0.009 and 0.870±0.002, for all cities in Brazil and US, respectively. The fact that the frequency of suicides is disproportionately small for larger cities reveals a surprisingly beneficial aspect of living and interacting in larger and more complex social networks.

Bots vs. Wikipedians, Anons vs. Logged-Ins

February 18, 2014 Comments off

Bots vs. Wikipedians, Anons vs. Logged-Ins

Wikipedia is a global crowdsourced encyclopedia that at time of writing is available in 287 languages. Wikidata is a likewise global crowdsourced knowledge base that provides shared facts to be used by Wikipedias. In the context of this research, we have developed an application and an underlying Application Programming Interface (API) capable of monitoring realtime edit activity of all language versions of Wikipedia and Wikidata. This application allows us to easily analyze edits in order to answer questions such as “Bots vs. Wikipedians, who edits more?”, “Which is the most anonymously edited Wikipedia?”, or “Who are the bots and what do they edit?”. To the best of our knowledge, this is the first time such an analysis could be done in realtime for Wikidata and for really all Wikipedias–large and small. Our application is available publicly online at the URL this http URL, its code has been open-sourced under the Apache 2.0 license.

User recommendation in reciprocal and bipartite social networks — a case study of online dating

December 12, 2013 Comments off

User recommendation in reciprocal and bipartite social networks –a case study of online dating (PDF)
Many social networks in our daily life are bipartite networks built on reciprocity. How can we recommend users/friends to a user, so that the user is interested in and attractive to recommended users? In this research, we propose a new collaborative filtering model to improve user recom-mendations in reciprocal and bipartite social networks. The model considers a user’s “taste” in picking others and “attractiveness” in being picked by others. A case study of an online dating network shows that the new model has good performance in recommending both initial and reciprocal contacts.

See: Love Connection: Advice for Online Daters (Science Daily)

Bayesian Analysis of Epidemics – Zombies, Influenza, and other Diseases

December 4, 2013 Comments off

Bayesian Analysis of Epidemics – Zombies, Influenza, and other Diseases

Mathematical models of epidemic dynamics offer significant insight into predicting and controlling infectious diseases. The dynamics of a disease model generally follow a susceptible, infected, and recovered (SIR) model, with some standard modifications. In this paper, we extend the work of Munz (2009) on the application of disease dynamics to the so-called “zombie apocalypse”, and then apply the identical methods to influenza dynamics. Unlike Munz (2009), we include data taken from specific depictions of zombies in popular culture films and apply Markov Chain Monte Carlo (MCMC) methods on improved dynamical representations of the system. To demonstrate the usefulness of this approach, beyond the entertaining example, we apply the identical methodology to Google Trend data on influenza to establish infection and recovery rates. Finally, we discuss the use of the methods to explore hypothetical intervention policies regarding disease outbreaks.

Determinants of the Pace of Global Innovation in Energy Technologies

October 21, 2013 Comments off

Determinants of the Pace of Global Innovation in Energy Technologies

Understanding the factors driving innovation in energy technologies is of critical importance to mitigating climate change and addressing other energy-related global challenges. Low levels of innovation, measured in terms of energy patent filings, were noted in the 1980s and 90s as an issue of concern and were attributed to low investment in public and private research and development (R&D). Here we build a comprehensive global database of energy patents covering the period 1970-2009 which is unique in its temporal and geographical scope. Analysis of the data reveals a recent, marked departure from historical trends. A sharp increase in rates of patenting has occurred over the last decade, particularly in renewable technologies, despite continued low levels of R&D funding. To solve the puzzle of fast innovation despite modest R&D increases we develop a model that explains the nonlinear response observed in the empirical data of technological innovation to various types of investment. The model reveals a regular relationship between patents, R&D funding, and growing markets across technologies, and accurately predicts patenting rates at different stages of technological maturity and market development. We show quantitatively how growing markets have formed a vital complement to public R&D in driving innovative activity; these two forms of investment have each leveraged the effect of the other in driving patenting trends over long periods of time.

Resurrecting My Revolution: Using Social Link Neighborhood in Bringing Context to the Disappearing Web

September 20, 2013 Comments off

Resurrecting My Revolution: Using Social Link Neighborhood in Bringing Context to the Disappearing Web

In previous work we reported that resources linked in tweets disappeared at the rate of 11% in the first year followed by 7.3% each year afterwards. We also found that in the first year 6.7%, and 14.6% in each subsequent year, of the resources were archived in public web archives. In this paper we revisit the same dataset of tweets and find that our prior model still holds and the calculated error for estimating percentages missing was about 4%, but we found the rate of archiving produced a higher error of about 11.5%. We also discovered that resources have disappeared from the archives themselves (7.89%) as well as reappeared on the live web after being declared missing (6.54%). We have also tested the availability of the tweets themselves and found that 10.34% have disappeared from the live web. To mitigate the loss of resources on the live web, we propose the use of a “tweet signature”. Using the Topsy API, we extract the top five most frequent terms from the union of all tweets about a resource, and use these five terms as a query to Google. We found that using tweet signatures results in discovering replacement resources with 70+% textual similarity to the missing resource 41% of the time.

How are mortality rates affected by population density?

July 20, 2013 Comments off

How are mortality rates affected by population density?

Biologists have found that the death rate of cells in culture depends upon their spatial density. Permanent “Stay alive” signals from their neighbours seem to prevent them from dying. In a previous paper (Wang et al. 2013) we gave evidence for a density effect for ants. In this paper we examine whether there is a similar effect in human demography. We find that although there is no observable relationship between population density and overall death rates, there is a clear relationship between density and the death rates of young age-groups. Basically their death rates decrease with increasing density. However, this relationship breaks down around 300 inhabitants per square kilometre. Above this threshold the death rates remains fairly constant. The same density effect is observed in Canada, France, Japan and the United States. We also observe a striking parallel between the density effect and the so-called marital status effect in the sense that they both lead to higher suicide rates and are both enhanced for younger age-groups. However, it should be noted that the strength of the density effect is only a fraction of the strength of the marital status effect. In spite of the fact that this parallel does not give us an explanation by itself, it invites us to focus on explanations that apply to both effects. In this light the “Stay alive” paradigm set forth by Prof. Martin Raff appears as a natural interpretation. It can be seen as an extension of the “social ties” framework proposed at the end of the 19th century by the sociologist Emile Durkheim in his study about suicide.

Not all paths lead to Rome: Analysing the network of sister cities

May 30, 2013 Comments off

Not all paths lead to Rome: Analysing the network of sister cities (PDF)


This work analyses the practice of sister city pairing. We investigate structural properties of the resulting city and country networks and present rankings of the most central nodes in these networks. We identify different country clusters and find that the practice of sister city pairing is not influenced by geographical proximity but results in highly assortative networks.

See: Does Being ‘Sister Cities’ Really Mean Anything? (Atlantic Cities)

Happiness and the Patterns of Life: A Study of Geolocated Tweets

May 1, 2013 Comments off

Happiness and the Patterns of Life: A Study of Geolocated Tweets


The patterns of life exhibited by large populations have been described and modeled both as a basic science exercise and for a range of applied goals such as reducing automotive congestion, improving disaster response, and even predicting the location of individuals. However, these studies previously had limited access to conversation content, rendering changes in expression as a function of movement invisible. In addition, they typically use the communication between a mobile phone and its nearest antenna tower to infer position, limiting the spatial resolution of the data to the geographical region serviced by each cellphone tower. We use a collection of 37 million geolocated tweets to characterize the movement patterns of 180,000 individuals, taking advantage of several orders of magnitude of increased spatial accuracy relative to previous work. Employing the recently developed sentiment analysis instrument known as the \textit{hedonometer}, we characterize changes in word usage as a function of movement, and find that expressed happiness increases logarithmically with distance from an individual’s average location.

How does group interaction and its severance affect life expectancy?

April 24, 2013 Comments off

How does group interaction and its severance affect life expectancy?


The phenomenon of apoptosis observed in cell cultures consists in the fact that unless cells permanently receive a "Stay alive" signal from their neighbors, they are bound to die. A natural question is whether manifestations of this apoptosis paradigm can also be observed in other organizations of living organisms. In this paper we report results from a two-year long campaign of experiments on three species of ants and one species of (tephritid) fruit flies. In these experiments individuals were separated from their colony and kept in isolation either alone or in groups of 10 individuals. The overall conclusion is that "singles" have a shorter life expectancy than individuals in the groups of 10. This observation holds for ants as well as for fruit flies. The paper also provides compelling evidence of a similar effect in married versus unmarried (i.e. single, widowed or divorced) people. A natural question concerns the dynamic of the transition between the two regimes. Observation suggests an abrupt (rather than smooth) transition and this conclusion seems to hold for ants, fruit flies and humans as well. We call it a shock transition. In addition, for red fire ants Solenopsis invicta, it was observed that individuals in groups of 10 that also comprise one queen, die much faster than those in similar groups without queens. The paper also examines the corresponding survivorship curves from the perspective of the standard classification into 3 types. The survivorship curves of ants (whether single or in groups of 10) are found to be of type II whereas those of the fruit fly Bactrocera dorsalis are rather of type III. In this connection it is recalled that the survivorship curve of the fruit fly Drosophila melanogaster is of type I, i.e. of same type as for humans.

See: How People and Animals in Isolation Die Sooner (The Atlantic)

The Groupon Effect on Yelp Ratings: A Root Cause Analysis

March 11, 2013 Comments off

The Groupon Effect on Yelp Ratings: A Root Cause Analysis (PDF)


Daily deals sites such as Groupon offer deeply discounted goods and services to tens of millions of customers through geographically targeted daily e-mail marketing campaigns. In our prior work we observed that a negative side effect for merchants using Groupons is that, on average, their Yelp ratings decline significantly. However, this previous work was essentially observational, rather than explanatory. In this work, we rigorously consider and evaluate various hypotheses about underlying consumer and merchant behavior in order to understand this phenomenon, which we dub the Groupon effect. We use statistical analysis and mathematical modeling, leveraging a dataset we collected spanning tens of thousands of daily deals and over 7 million Yelp reviews. In particular, we investigate hypotheses such as whether Groupon subscribers are more critical than their peers, or whether some fraction of Groupon merchants provide significantly worse service to customers using Groupons. We suggest an additional novel hypothesis: reviews from Groupon subscribers are lower on average because such reviews correspond to real, unbiased customers, while the body of reviews on Yelp contain some fraction of reviews from biased or even potentially fake sources. Although we focus on a specific question, our work provides broad insights into both consumer and merchant behavior within the daily deals marketplace.

The Geographic Flow of Music

April 25, 2012 Comments off

The Geographic Flow of MusicSource:

The social media website provides a detailed snapshot of what its users in hundreds of cities listen to each week. After suitably normalizing this data, we use it to test three hypotheses related to the geographic flow of music. The first is that although many of the most popular artists are listened to around the world, music preferences are closely related to nationality, language, and geographic location. We find support for this hypothesis, with a couple of minor, yet interesting, exceptions. Our second hypothesis is that some cities are consistently early adopters of new music (and early to snub stale music). To test this hypothesis, we adapt a method previously used to detect the leadership networks present in flocks of birds. We find empirical support for the claim that a similar leadership network exists among cities, and this finding is the main contribution of the paper. Finally, we test the hypothesis that large cities tend to be ahead of smaller cities-we find only weak support for this hypothesis.

+ Full Paper (PDF)

Daily Deals: Prediction, Social Diffusion, and Reputational Ramifications

October 3, 2011 Comments off

Daily Deals: Prediction, Social Diffusion, and Reputational Ramifications

Daily deal sites have become the latest Internet sensation, providing discounted offers to customers for restaurants, ticketed events, services, and other items. We begin by undertaking a study of the economics of daily deals on the web, based on a dataset we compiled by monitoring Groupon and LivingSocial sales in 20 large cities over several months. We use this dataset to characterize deal purchases; glean insights about operational strategies of these firms; and evaluate customers’ sensitivity to factors such as price, deal scheduling, and limited inventory. We then marry our daily deals dataset with additional datasets we compiled from Facebook and Yelp users to study the interplay between social networks and daily deal sites. First, by studying user activity on Facebook while a deal is running, we provide evidence that daily deal sites benefit from significant word-of-mouth effects during sales events, consistent with results predicted by cascade models. Second, we consider the effects of daily deals on the longer-term reputation of merchants, based on their Yelp reviews before and after they run a daily deal. Our analysis shows that while the number of reviews increases significantly due to daily deals, average rating scores from reviewers who mention daily deals are 10% lower than scores of their peers on average.

+ Full Paper (PDF)


Get every new post delivered to your Inbox.

Join 861 other followers