Archive

Archive for the ‘arXiv.org’ Category

From “I love you babe” to “leave me alone” – Romantic Relationship Breakups on Twitter

October 13, 2014 Comments off

From “I love you babe” to “leave me alone” – Romantic Relationship Breakups on Twitter
Source: arXiv.org

We use public data from Twitter to study the breakups of the romantic relationships of 661 couples. Couples are identified through profile references such as @user1 writing “@user2 is the best boyfriend ever!!”. Using this data set we find evidence for a number of existing hypotheses describing psychological processes including (i) pre-relationship closeness being indicative of post-relationship closeness, (ii) “stonewalling”, i.e., ignoring messages by a partner, being indicative of a pending breakup, and (iii) post-breakup depression. We also observe a previously undocumented phenomenon of “batch un-friending and being un-friended” where users who break up experience sudden drops of 15-20 followers and friends. Our work shows that public Twitter data can be used to gain new insights into psychological processes surrounding relationship dissolutions, something that most people go through at least once in their lifetime.

Hat tip: ResearchBuzz

About these ads

About the size of Google Scholar: playing the numbers

October 7, 2014 Comments off

About the size of Google Scholar: playing the numbers
Source: arXiv.org

The emergence of academic search engines (Google Scholar and Microsoft Academic Search essentially) has revived and increased the interest in the size of the academic web, since their aspiration is to index the entirety of current academic knowledge. The search engine functionality and human search patterns lead us to believe, sometimes, that what you see in the search engine’s results page is all that really exists. And, even when this is not true, we wonder which information is missing and why. The main objective of this working paper is to calculate the size of Google Scholar at present (May 2014). To do this, we present, apply and discuss up to 4 empirical methods: Khabsa & Giles’s method, an estimate based on empirical data, and estimates based on direct queries and absurd queries. The results, despite providing disparate values, place the estimated size of Google Scholar in about 160 million documents. However, the fact that all methods show great inconsistencies, limitations and uncertainties, makes us wonder why Google does not simply provide this information to the scientific community if the company really knows this figure.

Hat tip: ResearchBuzz

Online Social Networks: Threats and Solutions

August 11, 2014 Comments off

Online Social Networks: Threats and Solutions
Source: arXiv.org

Many online social network (OSN) users are unaware of the numerous security risks that exist in these networks, including privacy violations, identity theft, and sexual harassment, just to name a few. According to recent studies, OSN users readily expose personal and private details about themselves, such as relationship status, date of birth, school name, email address, phone number, and even home address. This information, if put into the wrong hands, can be used to harm users both in the virtual world and in the real world. These risks become even more severe when the users are children. In this paper we present a thorough review of the different security and privacy risks which threaten the well-being of OSN users in general, and children in particular. In addition, we present an overview of existing solutions that can provide better protection, security, and privacy for OSN users. We also offer simple-to-implement recommendations for OSN users which can improve their security and privacy when using these platforms. Furthermore, we suggest future research directions.

Hat tip: INFOdocket

The Shortest Path to Happiness: Recommending Beautiful, Quiet, and Happy Routes in the City

July 14, 2014 Comments off

The Shortest Path to Happiness: Recommending Beautiful, Quiet, and Happy Routes in the City
Source: arXiv.org

When providing directions to a place, web and mobile mapping services are all able to suggest the shortest route. The goal of this work is to automatically suggest routes that are not only short but also emotionally pleasant. To quantify the extent to which urban locations are pleasant, we use data from a crowd-sourcing platform that shows two street scenes in London (out of hundreds), and a user votes on which one looks more beautiful, quiet, and happy. We consider votes from more than 3.3K individuals and translate them into quantitative measures of location perceptions. We arrange those locations into a graph upon which we learn pleasant routes. Based on a quantitative validation, we find that, compared to the shortest routes, the recommended ones add just a few extra walking minutes and are indeed perceived to be more beautiful, quiet, and happy. To test the generality of our approach, we consider Flickr metadata of more than 3.7M pictures in London and 1.3M in Boston, compute proxies for the crowdsourced beauty dimension (the one for which we have collected the most votes), and evaluate those proxies with 30 participants in London and 54 in Boston. These participants have not only rated our recommendations but have also carefully motivated their choices, providing insights for future work.

Hat tip: ResearchBuzz

Privacy and Security in the Genomic Era

May 15, 2014 Comments off

Privacy and Security in the Genomic Era (PDF)
Source: arXiv.org

Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.

Social Media — Can Cascades be Predicted?

April 7, 2014 Comments off

Can Cascades be Predicted?
Source: arXiv.org

On many social networking web sites such as Facebook and Twitter, resharing or reposting functionality allows users to share others’ content with their own friends or followers. As content is reshared from user to user, large cascades of reshares can form. While a growing body of research has focused on analyzing and characterizing such cascades, a recent, parallel line of work has argued that the future trajectory of a cascade may be inherently unpredictable. In this work, we develop a framework for addressing cascade prediction problems. On a large sample of photo reshare cascades on Facebook, we find strong performance in predicting whether a cascade will continue to grow in the future. We find that the relative growth of a cascade becomes more predictable as we observe more of its reshares, that temporal and structural features are key predictors of cascade size, and that initially, breadth, rather than depth in a cascade is a better indicator of larger cascades. This prediction performance is robust in the sense that multiple distinct classes of features all achieve similar performance. We also discover that temporal features are predictive of a cascade’s eventual shape. Observing independent cascades of the same content, we find that while these cascades differ greatly in size, we are still able to predict which ends up the largest.

Home Location Identification of Twitter Users

March 27, 2014 Comments off

Home Location Identification of Twitter Users
Source: arXiv.org

We present a new algorithm for inferring the home location of Twitter users at different granularities, including city, state, time zone or geographic region, using the content of users tweets and their tweeting behavior. Unlike existing approaches, our algorithm uses an ensemble of statistical and heuristic classifiers to predict locations and makes use of a geographic gazetteer dictionary to identify place-name entities. We find that a hierarchical classification approach, where time zone, state or geographic region is predicted first and city is predicted next, can improve prediction accuracy. We have also analyzed movement variations of Twitter users, built a classifier to predict whether a user was travelling in a certain period of time and use that to further improve the location detection accuracy. Experimental evidence suggests that our algorithm works well in practice and outperforms the best existing algorithms for predicting the home location of Twitter users.

Hat tip: ResearchBuzz

Follow

Get every new post delivered to your Inbox.

Join 946 other followers