XRay: Enhancing the Web’s Transparency with Differential Correlation (PDF)
Source: Columbia University
Today’s Web services – such as Google, Amazon, and Facebook – leverage user data for varied purposes, including personalizing recommendations, targeting advertisements, and adjusting prices. At present, users have little insight into how their data is being used. Hence, they cannot make informed choices about the services they choose.
To increase transparency, we developed XRay, the first fine-grained, robust, and scalable personal data tracking system for the Web. XRay predicts which data in an arbitrary Web account (such as emails, searches, or viewed products) is being used to target which outputs (such as ads, recommended products, or prices). XRay’s core functions are service agnostic and easy to instantiate for new services, and they can track data within and across services. To make predictions independent of the audited service, XRay relies on the following insight: by comparing outputs from different accounts with similar, but not identical, subsets of data, one can pinpoint targeting through correlation. We show both theoretically, and through experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision and recall by correlating data from a surprisingly small number of extra accounts.
A Measurement Study of Google Play (PDF)
Source: Columbia University
Although millions of users download and use third-party Android applications from the Google Play store, little in- formation is known on an aggregated level about these applications. We have built PlayDrone, the first scalable Google Play store crawler, and used it to index and analyze over 1,100,000 applications in the Google Play store on a daily basis, the largest such index of Android applications. PlayDrone leverages various hacking techniques to circumvent Google’s roadblocks for indexing Google Play store con- tent, and makes proprietary application sources available, including source code for over 880,000 free applications. We demonstrate the usefulness of PlayDrone in decompiling and analyzing application content by exploring four previously unaddressed issues: the characterization of Google Play application content at large scale and its evolution over time, library usage in applications and its impact on application portability, duplicative application content in Google Play, and the ineffectiveness of OAuth and related service authentication mechanisms resulting in malicious users being able to easily gain unauthorized access to user data and resources on Amazon Web Services and Facebook.
The Art and Science of Data-Driven Journalism (PDF)
Source: Tow Center for Digital Journalism, Columbia University School of Journalism
Journalists have been using data in their stories for as long as the profession has existed . A revolution in computing in the 20th century created opportunities for data integration into investiga tions, as journalists began to bring tec hnology into their work. In the 21st century, a revolution in connectivity is leading the media toward new horizons. The Internet, cloud computing, agile development, mobile devices , and open source software have transformed the practice of journalism, lea ding to the emergence of a new term: data journalism.
Although journalists have been using data in their stories for as long as they have been engaged in reporting, data journalism is more than traditional journalism with more data. Decades after early p ioneers successfully applied computer – assisted reporting and social science to investigative journalism, journalists are creating news apps and interactive features that help people understand data, explore it , and act upon the insights derived from it. Ne w business models are emerging in which data is a raw material for profit, impact , and insight, co – created with an audience that was formerly reduced to passive consumption. Journalists around the world are grappling with the excitement and the challenge o f telling compelling stories by harnessing the vast quantity of data that our increasingly networked lives, devices, businesses , and governments produce every day.
While the potential of data journalism is immense, the pitfalls and challenges to its adop tion throughout the media are similarly significant, from digital literacy to competition for scarce resources in newsrooms. Global threats to press freedom, digital security, and limited access to data create difficult working conditions for journalists i n many countries. A combination of peer – to – peer learning, mentorship, online training, open data initiatives, and new programs at journalism schools rising to the challenge, however, offer reasons to be optimistic about more journalists learning to treat d ata as a source.
MOOCs: Expectations and Reality (PDF)
Source: Teachers College, Columbia University
Over the past few years, observers of higher education have speculated about dramatic changes that must occur to accommodate more learners at lower costs and to facilitate a shift away from the accumulation of knowledge to the acquisition of a variety of cognitive and non-cognitive skills. All scenarios feature a major role for technology and online learning. Massive open online courses (MOOCs) are the most recent candidates being pushed forward to fulfill these ambitious goals. To date, there has been little evidence collected that would allow an assessment of whether MOOCs do indeed provide a cost-effective mechanism for producing desirable educational outcomes at scale. It is not even clear that these are the goals of those institutions offering MOOCs. This report investigates the actual goals of institutions creating MOOCs or integrating them into their programs, and reviews the current evidence regarding whether and how these goals are being achieved, and at what cost.
Through interviews with 83 administrators, faculty members, researchers, and other actors from 62 different institutions (see Appendices I, III and VI for details) active in the MOOCspace or more generally in online learning, we observed that colleges and universities have adopted several different stances towards engaging with MOOCs and are using them as vehicles to pursue multiple goals. Some institutions are actively developing MOOCs and may be termed “producers,” some are using MOOCs developed by other institutions in their programs and could be termed “consumers,” and a few are doing both. Others are adopting a “wait-and-see” approach, or have considered MOOCs and have decided against any form of official engagement. There is no doubt, however, that the advent of MOOCs has precipitated many institutions to consider or revisit their strategy with respect to online learning, whether at large scale or small.
Amidst Bitter Cold and Rising Energy Costs, New Concerns About Energy Insecurity
Source: Columbia University (Mailman School of Public Health)
With many regions of the country facing an unrelenting cold snap, the problem of energy insecurity continues to go unreported despite its toll on the most vulnerable. In a new brief, researchers at the Mailman School of Public Health paint a picture of the families most impacted by this problem and suggest recommendations to alleviate its chokehold on millions of struggling Americans. The authors note that government programs to address energy insecurity are coming up short, despite rising energy costs.
Energy Insecurity (EI) is measured by the proportion of household energy expenditures relative to household income. Lower-income families are more likely to experience EI because they tend to live in housing that has not benefited from the structural improvements that wealthier Americans can afford.
State Hazard Mitigation Plans & Climate Change : Rating the States (PDF)
Source: Columbia University Law School, Center for Climate Change Law
Climate change is affecting and will continue to affect the frequency and severity of natural hazard events, a trend that is of increasing concern for emergency managers and hazard mitigation agencies across the United States. Proper response to these hazards will require preparation and planning. Unfortunately, states are not required to include analysis of climate change in their State Hazard Mitigation Plans, which leads to uneven treatment of the issue and missed opportunities for m itigation planning. This survey identifies those state plans that address climate change and climate – related issues in an accurate and helpful manner and those that do not. Several states will be releasing updated State Hazard Mitigation Plans in 2013 and 2014, and this survey forms a basis for improving those plans through shared lessons learned and targeted communication. The results of the survey indicate that coastal states are more likely to include a discussion of climate change, possibly due in part to recent emphasis on and awareness of the relationship between climate change and sea level rise, coastal storms, and related hazards. The relative lack of discussion of climate change in land – locked states may point to a need for greater communication of how risks such as drought, floods, heat events, and non – coastal storms are affected by climate change. State plans that currently include climate change analyses and a daptation plans may be used as examples for improving other plans. This survey provides a basis for further analysis comparing future plans and determining whether they include an improved discussion of climate change.
Did Flexibility Compromise No Child Left Behind?
Source: Columbia Business School
n 2001, the US Congress passed the No Child Left Behind Act in an effort to measure and improve student performance in math and English language skills. The law required states to adopt standardized tests and to create timelines, with annual benchmarks, that would bring student proficiency levels to 100 percent by 2014. Schools were judged not only on their overall achievement rates, but on the performance of various subgroups, such as students from low-income families or historically disadvantaged ethnic groups. If any of these subgroups failed to meet the annual goals, the school would fail and therefore face penalties including the loss of funding and the possibility of restructuring or closure.
Under these new regulations, however, states were allowed wide flexibility. States were allowed to choose their own standardized tests, set their annual benchmarks, and consider various allowances when grading a school’s performance. The result? Small, often subtle differences in implementation led to significant differences in measured outcomes, according to new research by Professor Jonah Rockoff, who worked with Elizabeth Davidson of Teachers College and Randall Reback of Barnard College, both at Columbia University, and with Heather Schwartz of the RAND Corporation.
One of these subtle yet significant differences in implementation was the use, by some states, of a confidence interval, a statistical means of accounting for sampling error. Suppose a state set its math proficiency benchmark at 58 percent, and 56 percent of students in a particular school score above the state’s proficiency level. The question then becomes: can state administrators be fairly certain that if all the students at the school took the test again, it would still fail to reach 58 percent? In order to provide more assurance that failing schools were truly below the benchmark, states adopted confidence intervals as high as 90, 95, or even 99 percent..
With a large confidence interval, the real bar is often far below the stated benchmark, Rockoff explains. “It might sound trivial, but with a very wide confidence interval, maybe only 30 percent of students had to pass the test,” instead of 58 percent, he says. “It’s odd, because we never use confidence intervals in grading; a student could not tell a teacher, ’it’s true that I failed, but you can’t be 99 percent sure that if you tested me again, I would still fail.’” Yet that is essentially how many states implemented No Child Left Behind. And while some states had wide confidence intervals, others set none, leading to dramatic differences in reported failure rates.
On Information Distortions in Online Ratings
Source: Columbia University Business School (Besbes)
Online reviews for products and services are typically reported sequentially, rather than in parallel; new reviews come in all the time, not all at once. Most consumers, before submitting a review, are therefore exposed to reviews by others that have already been posted, and tend to look at this history — particularly at the average review, often synthesized in a star or number ranking — before writing their own. Previous research has shown that a review tends to reflect not just a consumer’s personal opinion, but also the influence of reviews that preceded it.
In a recent working paper, Professor Omar Besbes, working with Marco Scarsini of the Singapore University of Technology and Design, examined how the sequential nature of reviews distorts the statistics of ratings (such as the average or their distribution) from what might be called the “true” statistics — those one would observe if consumers submitted their reviews all at once, without reviewing previously submitted reviews. The researchers analyze a broad class of consumer reporting mechanisms that account for past reviews, dividing those into two behavioral categories: compensating behavior and herding behavior. Compensating consumers award an inflated high score or an exaggeratedly low score in an attempt to shift the average of ratings toward the rating they believe the product deserves. Herding consumers, on the other hand, follow the prevailing opinion, shifting their own personal rating toward the average in the belief that the crowd must be right.
Whether consumers are herding or compensating, the average of reported ratings for a product will stabilize over time; however, the researchers found, this average rating might be above or below the “true” average, a difference they define as the bias gap. Compensating and herding behavior have very different effects on this gap, the researchers’ model showed. While compensating tends to have limited influence on where a score eventually stabilizes, herding can be very significant; While the average rating is highly influenced by the sequential nature of reviews, the position of a product or a service relative to competitors is usually preserved. The researchers further analyze the potential for manipulation of reviews and show that compensating behavior limits the impact of review manipulation. However, when herding behavior occurs, a few very early reviews can dramatically shift the trajectory of a product’s ratings toward the high or low extremes.
Intensity and Attachment: How the Chaotic Enrollment Patterns of Community College Students Affect Educational Outcomes
Source: Community College Research Center (Columbia University Teachers College)
This paper examines the relationship between community college enrollment patterns and two successful student outcomes—credential completion and transfer to a four-year institution. It also introduces a new way of visualizing the various attendance patterns of community college students. Patterns of enrollment intensity (full-time or part-time status) and continuity (enrolling in consecutive terms or skipping one or more terms) are graphed and then clustered according to their salient features.
Using data on cohorts of first-time community college students at five colleges in a single state, the author finds that, over an 18-semester period, 10 patterns of attendance account for nearly half the students. Among the remaining students who persisted, there is astounding variation in their patterns of enrollment. Clustering these patterns reveals two relationships: the first is a positive association between enrollment continuity and earning a community college credential, and the second is a positive association between enrollment intensity and likelihood of transfer.
Source: National Center on Addiction and Substance Abuse (Columbia University)
Forty million Americans ages 12 and older have addiction involving nicotine, alcohol or other drugs, a disease affecting more Americans than heart conditions, diabetes or cancer according to a five-year national study released today by The National Center on Addiction and Substance Abuse at Columbia University (CASA Columbia). Another 80 million people are risky substance users – using tobacco, alcohol and other drugs in ways that threaten health and safety.
The report, Addiction Medicine: Closing the Gap between Science and Practice, reveals that while about 7 in 10 people with diseases like hypertension, major depression and diabetes receive treatment, only about 1 in 10 people who need treatment for addiction involving alcohol or other drugs receive it. Of those who do receive treatment, most do not receive anything that approximates evidence-based care.
The CASA Columbia report finds that addiction treatment is largely disconnected from mainstream medical practice. While a wide range of evidence-based screening, intervention, treatment and disease management tools and practices exist, they rarely are employed. The report exposes the fact that most medical professionals who should be providing treatment are not sufficiently trained to diagnose or treat addiction, and most of those providing addiction treatment are not medical professionals and are not equipped with the knowledge, skills or credentials necessary to provide the full range of evidence-based services.
Source: Congressional Research Service (via Federation of American Scientists)
The U.S.-Colombia Trade Promotion Agreement entered into force on May 15, 2012. It is a comprehensive free trade agreement (FTA) between the United States and Colombia, which will eventually eliminate tariffs and other barriers in bilateral trade in goods and services. On October 3, 2011, President Barack Obama submitted draft legislation (H.R. 3078/S. 1641) to both houses of Congress to implement the agreement. On October 12, 2011, the House passed H.R. 3078 (262-167) and sent it to the Senate. The Senate passed the implementing legislation (66-33) on the same day. The agreement was signed by both countries almost five years earlier, on November 22, 2006. The Colombian Congress approved it in June 2007 and again in October 2007, after it was modified to include new provisions agreed to in the May 10, 2007 bipartisan understanding between congressional leadership and President George W. Bush.
The United States is Colombia’s leading trade partner. Colombia accounts for a very small percentage of U.S. trade (1.0% in 2011), ranking 22nd among U.S. export markets and 23rd as a supplier of U.S. imports. Economic studies on the impact of a U.S.-Colombia free trade agreement (FTA) have found that, upon full implementation of an agreement, the impact on the United States would be positive but very small due to the small size of the Colombian economy when compared to that of the United States (about 2.2%).
The congressional debate surrounding the CFTA mostly centered on violence, labor, and human rights issues in Colombia. Numerous Members of Congress opposed passage of the agreement because of concerns about alleged targeted violence against union members in Colombia, inadequate efforts to bring perpetrators to justice, and weak protection of worker rights. However, other Members of Congress supported the CFTA and took issue with these charges, stating that Colombia had made great progress over the last ten years to curb violence and enhance security. They also argued that U.S. exporters were losing market share of the Colombian market and that the agreement would open the Colombian market for U.S. goods and services. For Colombia, an FTA with the United States has been part of its overall economic development strategy.
To address the concerns related to labor rights and violence in Colombia, the United States and Colombia agreed upon an “Action Plan Related to Labor Rights” that included specific and concrete steps, with specific timelines, most of which took place in 2011. It includes numerous commitments by the Colombian government to protect union members, end impunity, and improve worker rights. The Colombian government submitted documents to the United States in time to meet various target dates listed in the Action Plan. The USTR reviewed the documents and determined that Colombia had met its major commitments.
The U.S. business community generally supports the FTA with Colombia because it sees it as an opportunity to increase U.S. exports to Colombia. U.S. exporters urged policymakers to move forward with the agreement, arguing that the United States was losing market share of the Colombian market, especially in agriculture, as Colombia entered into FTAs with other countries. Colombia’s FTA with Canada, which was implemented on August 15, 2011, was of particular concern for U.S. agricultural producers. Critics of the agreement expressed concerns about violence against union members and the lack of protection of worker rights in Colombia, especially in labor cooperatives. Labor unions in general remain highly opposed to the agreement. They argue that Colombia’s labor movement is under attack through violence, intimidation, and harassment, as well as legal challenges.
Source: American Assembly (Columbia University)
This research note is an effort to bring American public opinion to bear on this vital conversation. The note excerpts a forthcoming survey-based study called Copy Culture in the U.S. and Germany. Drawing on results from the U.S. portion of the survey, it explores what Americans do with digital media, what they want to do, and how they reconcile their attitudes and values with different policies and proposals to enforce copyright online.
The Copy Culture survey was sponsored by The American Assembly, with support from a research award from Google. The content of the survey and its findings are solely the responsibility of the researchers. The U.S. survey was conducted by Princeton Survey Research Associates International. The results are based on interviews on landline and cellular telephones conducted in English with 2,303 adults age 18 or older living in the continental United States from August 1-31, 2011. For results based on the entire sample, the margin of error is plus or minus 2 percentage points.
Under-Savers Anonymous: Evidence on Self-Help Groups and Peer Pressure as a Savings Commitment Device
While commitment devices such as defaults and direct deposits from wages have been found to be highly effective to increase savings, they are unavailable to the millions of people worldwide who do not have a formal wage bill. Self-help peer groups are an alternative commitment device that is widespread and highly accessible, but there is little empirical evidence evaluating their effectiveness. We conduct two randomized field experiments among low-income micro-entrepreneurs in Chile. The first experiment finds that self-help peer groups are very potent at increasing savings. In contrast, a more classical measure, a substantially increased interest rate, has no effect on the vast majority of participants. A second experiment is designed to unbundle the key elements of peer groups as a commitment device, through the use of regular text messages. It finds that surprisingly, actual meetings and peer pressure do not seem to be crucial in making self-help peer groups an effective tool to encourage savings.
+ Full Paper (PDF)