Abstract: | Tens of millions of individuals around the world use decentralized content distribution systems, a fact of growing social, economic, and technological importance. These sharing systems are poorly understood because, unlike in other technosocial systems, it is difficult to gather large-scale data about user behavior. Here, we investigate user activity patterns and the socioeconomic factors that could explain the behavior. Our analysis reveals that (i) the ecosystem is heterogeneous at several levels: content types are heterogeneous, users specialize in a few content types, and countries are heterogeneous in user profiles; and (ii) there is a strong correlation between socioeconomic indicators of a country and users behavior. Our findings open a research area on the dynamics of decentralized sharing ecosystems and the socioeconomic factors affecting them, and may have implications for the design of algorithms and for policymaking.Every month, ∼150 million users worldwide share files over the Internet using BitTorrent (1), the most widely used decentralized peer-to-peer (P2P) communication protocol. Eleven years after its inception, file sharing through BitTorrent is one of the top three major contributors to the overall Internet traffic, accounting for 9–27% of the total traffic, depending on the continent (2, 3).The expansion in scale and breadth of decentralized file-sharing has highlighted the conflicts between the interests of creators (musicians and writers, e.g.) and those of P2P users. Creators and creative industries argue that they are being deprived of fair compensation for their work (4), which is being widely distributed for free in violation of copyright laws. Users, however, argue that P2P can be (and is) used for sharing nonproprietary contents, and warn that widespread monitoring of online activity by corporations and law enforcement violates P2P users’ right to privacy. Proof of the complexity of the situation includes the rejection of the Anti-Counterfeiting Trade Agreement by the European Parliament and the controversy with the Stop Online Piracy Act in the United States.Despite the growing social, economic, and technological importance of BitTorrent (4), there is currently little understanding of how users behave in this complex technosocial (5, 6) ecosystem. Due to the decentralized structure of P2P ecosystems, it is very difficult to gather large-scale data about interactions and behavioral patterns of the users without their explicit consent; this is in contrast to other forms of online exchange where all of the information is stored in a central system, be it publicly accessible as in Wikipedia (7), partially accessible through a public interface as in Twitter (8, 9) or Google [through its search logs (10) or its public services (11, 12)], or restricted as in Facebook (13, 14) or in email communications within organizations (15–18).Because of the difficulty to collect complete user-level data of large and representative samples of users (3), studies of user behavior in P2P networks have so far been based on (i) small datasets; (ii) aggregate data collected from “trackers” or from individual Internet service providers (ISPs); and (iii) incomplete user data collected using a single crawler client connected to the network (19–23).Here, we investigate the complete activity patterns of a large and representative pool of BitTorrent users. Our analysis reveals that P2P sharing is highly heterogeneous, that users are specialized, giving rise to well-defined user profiles, and that the abundance of certain user profiles in a country is highly correlated with socioeconomic factors. Our findings open a research area on the dynamics of decentralized sharing ecosystems, and may have implications for the understanding and design of algorithms and for policymaking. |