When providing directions to a place, web and mobile mapping services are all able to suggest the shortest route. The goal of this work is to automatically suggest routes that are not only short but also emotionally pleasant. To quantify the extent to which urban locations are pleasant, we use data from a crowd-sourcing platform that shows two street scenes in London (out of hundreds), and a user votes on which one looks more beautiful, quiet, and happy. We consider votes from more than 3.3K individuals and translate them into quantitative measures of location perceptions. We arrange those locations into a graph upon which we learn pleasant routes. Based on a quantitative validation, we find that, compared to the shortest routes, the recommended ones add just a few extra walking minutes and are indeed perceived to be more beautiful, quiet, and happy. To test the generality of our approach, we consider Flickr metadata of more than 3.7M pictures in London and 1.3M in Boston, compute proxies for the crowdsourced beauty dimension (the one for which we have collected the most votes), and evaluate those proxies with 30 participants in London and 54 in Boston. These participants have not only rated our recommendations but have also carefully motivated their choices, providing insights for future work.
Hat tip: ResearchBuzz
Privacy and Security in the Genomic Era (PDF)
Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.
Statistical Signs of Social Influence on Suicides
Certain currents in sociology consider society as being composed of autonomous individuals with independent psychologies. Others, however, deem our actions as strongly influenced by the accepted standards of social behavior. The later view was central to the positivist conception of society when in 1887 \’Emile Durkheim published his monograph Suicide (Durkheim, 1897). By treating the suicide as a social fact, Durkheim envisaged that suicide rates should be determined by the connections (or the lack of them) between people and society. Under the same framework, Durkheim considered that crime is bound up with the fundamental conditions of all social life and serves a social function. In this sense, and regardless of its extremely deviant nature, crime events are somehow capable to release certain social tensions and so have a purging effect in society. The social effect on the occurrence of homicides has been previously substantiated (Bettencourt et al., 2007; Alves et al., 2013), and confirmed here, in terms of a superlinear scaling relation: by doubling the population of a Brazilian city results in an average increment of 135 % in the number of homicides, rather than the expected isometric increase of 100 %, as found, for example, for the mortality due to car crashes. Here we present statistical signs of the social influence on the suicide occurrence in cities. Differently from homicides (superlinear) and fatal events in car crashes (isometric), we find sublinear scaling behavior between the number of suicides and city population, with allometric power-law exponents, β=0.836±0.009 and 0.870±0.002, for all cities in Brazil and US, respectively. The fact that the frequency of suicides is disproportionately small for larger cities reveals a surprisingly beneficial aspect of living and interacting in larger and more complex social networks.
Bots vs. Wikipedians, Anons vs. Logged-Ins
Wikipedia is a global crowdsourced encyclopedia that at time of writing is available in 287 languages. Wikidata is a likewise global crowdsourced knowledge base that provides shared facts to be used by Wikipedias. In the context of this research, we have developed an application and an underlying Application Programming Interface (API) capable of monitoring realtime edit activity of all language versions of Wikipedia and Wikidata. This application allows us to easily analyze edits in order to answer questions such as “Bots vs. Wikipedians, who edits more?”, “Which is the most anonymously edited Wikipedia?”, or “Who are the bots and what do they edit?”. To the best of our knowledge, this is the first time such an analysis could be done in realtime for Wikidata and for really all Wikipedias–large and small. Our application is available publicly online at the URL this http URL, its code has been open-sourced under the Apache 2.0 license.
Bayesian Analysis of Epidemics – Zombies, Influenza, and other Diseases
Mathematical models of epidemic dynamics offer significant insight into predicting and controlling infectious diseases. The dynamics of a disease model generally follow a susceptible, infected, and recovered (SIR) model, with some standard modifications. In this paper, we extend the work of Munz et.al (2009) on the application of disease dynamics to the so-called “zombie apocalypse”, and then apply the identical methods to influenza dynamics. Unlike Munz et.al (2009), we include data taken from specific depictions of zombies in popular culture films and apply Markov Chain Monte Carlo (MCMC) methods on improved dynamical representations of the system. To demonstrate the usefulness of this approach, beyond the entertaining example, we apply the identical methodology to Google Trend data on influenza to establish infection and recovery rates. Finally, we discuss the use of the methods to explore hypothetical intervention policies regarding disease outbreaks.
Determinants of the Pace of Global Innovation in Energy Technologies
Understanding the factors driving innovation in energy technologies is of critical importance to mitigating climate change and addressing other energy-related global challenges. Low levels of innovation, measured in terms of energy patent filings, were noted in the 1980s and 90s as an issue of concern and were attributed to low investment in public and private research and development (R&D). Here we build a comprehensive global database of energy patents covering the period 1970-2009 which is unique in its temporal and geographical scope. Analysis of the data reveals a recent, marked departure from historical trends. A sharp increase in rates of patenting has occurred over the last decade, particularly in renewable technologies, despite continued low levels of R&D funding. To solve the puzzle of fast innovation despite modest R&D increases we develop a model that explains the nonlinear response observed in the empirical data of technological innovation to various types of investment. The model reveals a regular relationship between patents, R&D funding, and growing markets across technologies, and accurately predicts patenting rates at different stages of technological maturity and market development. We show quantitatively how growing markets have formed a vital complement to public R&D in driving innovative activity; these two forms of investment have each leveraged the effect of the other in driving patenting trends over long periods of time.
This work analyses the practice of sister city pairing. We investigate structural properties of the resulting city and country networks and present rankings of the most central nodes in these networks. We identify different country clusters and find that the practice of sister city pairing is not influenced by geographical proximity but results in highly assortative networks.
See: Does Being ‘Sister Cities’ Really Mean Anything? (Atlantic Cities)
Daily deals sites such as Groupon offer deeply discounted goods and services to tens of millions of customers through geographically targeted daily e-mail marketing campaigns. In our prior work we observed that a negative side effect for merchants using Groupons is that, on average, their Yelp ratings decline signiﬁcantly. However, this previous work was essentially observational, rather than explanatory. In this work, we rigorously consider and evaluate various hypotheses about underlying consumer and merchant behavior in order to understand this phenomenon, which we dub the Groupon effect. We use statistical analysis and mathematical modeling, leveraging a dataset we collected spanning tens of thousands of daily deals and over 7 million Yelp reviews. In particular, we investigate hypotheses such as whether Groupon subscribers are more critical than their peers, or whether some fraction of Groupon merchants provide signiﬁcantly worse service to customers using Groupons. We suggest an additional novel hypothesis: reviews from Groupon subscribers are lower on average because such reviews correspond to real, unbiased customers, while the body of reviews on Yelp contain some fraction of reviews from biased or even potentially fake sources. Although we focus on a speciﬁc question, our work provides broad insights into both consumer and merchant behavior within the daily deals marketplace.
Daily deal sites have become the latest Internet sensation, providing discounted offers to customers for restaurants, ticketed events, services, and other items. We begin by undertaking a study of the economics of daily deals on the web, based on a dataset we compiled by monitoring Groupon and LivingSocial sales in 20 large cities over several months. We use this dataset to characterize deal purchases; glean insights about operational strategies of these firms; and evaluate customers’ sensitivity to factors such as price, deal scheduling, and limited inventory. We then marry our daily deals dataset with additional datasets we compiled from Facebook and Yelp users to study the interplay between social networks and daily deal sites. First, by studying user activity on Facebook while a deal is running, we provide evidence that daily deal sites benefit from significant word-of-mouth effects during sales events, consistent with results predicted by cascade models. Second, we consider the effects of daily deals on the longer-term reputation of merchants, based on their Yelp reviews before and after they run a daily deal. Our analysis shows that while the number of reviews increases significantly due to daily deals, average rating scores from reviewers who mention daily deals are 10% lower than scores of their peers on average.
+ Full Paper (PDF)