The era of behavioural big data has created new avenues for data science research, with many new contributions stemming from academic researchers. Yet data controlled by platforms have become increasingly difficult for academics to access. Platforms now routinely use algorithmic behaviour modification techniques to manipulate users’ behaviour, leaving academic researchers further isolated in conducting important data science and computational social science research. This isolation results from researchers’ lack of access to human behavioural data and, crucially, to both the data on machine behaviour that triggers and learns from the human data and the platform’s behaviour modification mechanisms. Given the impact of behaviour modification on individual and societal well-being, we discuss the consequences for data science knowledge creation, and encourage academic data scientists to take on new roles in producing research to promote (1) platform transparency and (2) informed public debate around the social purpose and function of digital platforms.
This is a preview of subscription content
Subscribe to Journal
Get full journal access for 1 year
only 7,71 € per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Shmueli, G. Research dilemmas with behavioral big data. Big Data 5, 98–119 (2017).
Olteanu, A., Castillo, C., Diaz, F. & Kıcıman, E. Social data: biases, methodological pitfalls and ethical boundaries. Front. Big Data 2, 13 (2019).
Wu, A. X. & Taneja, H. Platform enclosure of human behavior and its measurement: using behavioral trace data against platform episteme. New Media Soc. 23, 2650–2667 (2020).
Lazer, D. M. et al. Computational social science: obstacles and opportunities. Science 369, 1060–1062 (2020).
Sadowski, J., Viljoen, S. & Whittaker, M. Everyone should decide how their digital data are used—not just tech companies. Nature 595, 169–171 (2021).
Rahwan, I. et al. Machine behaviour. Nature 568, 477–486 (2019).
Bak-Coleman, J. B. et al. Stewardship of global collective behavior. Proc. Natl Acad. Sci. USA 118, e2025764118 (2021).
Srnicek, N. Platform Capitalism (Wiley, 2017).
Helmond, A. The platformization of the web: making web data platform ready. Social Media Soc. 1, 1–11 (2015).
Zuboff, S. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power (Profile Books, 2019).
Gauci, J. et al. Horizon: Facebook’s open source applied reinforcement learning platform. Preprint at https://arxiv.org/abs/1811.00260 (2018).
De Cnudde, S. et al. What does your facebook profile reveal about your creditworthiness? Using alternative data for microfinance. J. Oper. Res. Soc. 70, 353–363 (2019).
Kosinski, M., Stillwell, D. & Graepel, T. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl Acad. Sci USA 110, 5802–5805 (2013).
Matz, S. C., Kosinski, M., Nave, G. & Stillwell, D. J. Psychological targeting as an effective approach to digital mass persuasion. Proc. Natl Acad. Sci. USA 114, 12714–12719 (2017).
Gauci, J., Liu, H., Ghavamzadeh, M. & Nahmias, R. Open-sourcing Reagent, a Modular, End-to-end Platform for Building Reasoning Systems https://ai.facebook.com/blog/open-sourcing-reagent-a-platform-for-reasoning-systems/ (2019);
Michie, S. et al. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Ann. Behav. Med. 46, 81–95 (2013).
Milano, S., Mittelstadt, B., Wachter, S. & Russell, C. Epistemic fragmentation poses a threat to the governance of online targeting. Nat. Mach. Intell. 3, 466–472 (2021).
Fogg, B. J. Persuasive Technology: Using Computers to Change What We Think and Do (Morgan Kaufmann, 2002).
Yeung, K. ‘hypernudge’: big data as a mode of regulation by design. Inf. Commun. Soc. 20, 118–136 (2017).
Kaptein, M., Markopoulos, P., De Ruyter, B. & Aarts, E. Personalizing persuasive technologies: explicit and implicit personalization using persuasion profiles. Int. J. Human Comput. Stud. 77, 38–51 (2015).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018).
Chen, M. et al. Top-K off-policy correction for a reinforce recommender system. In Proc. Twelfth ACM International Conference on Web Search and Data Mining 456–464 (ACM, 2019).
Eyal, N. Hooked: How to Build Habit-Forming Products (Penguin, 2014).
Bird, S., Barocas, S., Crawford, K., Diaz, F. & Wallach, H. Exploring or exploiting? Social and ethical implications of autonomous experimentation in AI. In Workshop on Fairness, Accountability and Transparency in Machine Learning (2016); https://ssrn.com/abstract=2846909
Burr, C., Cristianini, N. & Ladyman, J. An analysis of the interaction between intelligent software agents and human users. Minds Mach. 28, 735–774 (2018).
Russell, S. Human Compatible: Artificial Intelligence and the Problem of Control (Penguin, 2019).
Cristianini, N., Scantamburlo, T. & Ladyman, J. The social turn of artificial intelligence. AI Soc. https://doi.org/10.1007/s00146-021-01289-8 (2021).
Milano, S., Taddeo, M. & Floridi, L. Recommender systems and their ethical challenges. AI Soc. 35, 957–967 (2020).
Menczer, F. 4 reasons why social media make us vulnerable to manipulation. In Proc. Fourteenth ACM Conference on Recommender Systems 1 (ACM, 2020); https://doi.org/10.1145/3383313.3418434
Beam, M. A., Hutchens, M. J. & Hmielowski, J. D. Facebook news and (de) polarization: reinforcing spirals in the 2016 US election. Inf. Commun. Soc. 21, 940–958 (2018).
Bidar, M. Liberals to ‘Moscow Mitch,’ conservatives to QAnon: Facebook researchers saw how its algorithms led to misinformation. CBS News Online (25 October 2021); https://www.cbsnews.com/news/facebook-algorithm-news-feed-conservatives-liberals-india/
Saar-Tsechansky, M., Melville, P. & Provost, F. Active feature-value acquisition. Manag. Sci. 55, 664–684 (2009).
Saar-Tsechansky, M. & Provost, F. Handling missing values when applying classification models. J. Mach. Learn. Res. 8, 1623–1657 (2007).
Yahav, I., Shmueli, G. & Mani, D. A tree-based approach for addressing self-selection in impact studies with big data. MIS Q. 40, 819–848 (2016).
Athey, S. & Imbens, G. Recursive partitioning for heterogeneous causal effects. Proc. Natl Acad. Sci. USA 113, 7353–7360 (2016).
Martens, D., Provost, F., Clark, J. & de Fortuny, E. J. Mining massive fine-grained behavior data to improve predictive analytics. MIS Q. 40, 869–888 (2016).
Ramon, Y., Martens, D., Provost, F. & Evgeniou, T. A comparison of instance-level counterfactual explanation algorithms for behavioral and textual data: SEDC, LIME-C and SHAP-C. Adv. Data Anal. Classif 14, 801–819 (2020).
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Walker, D. & Muchnik, L. Design of randomized experiments in networks. Proc. IEEE 102, 1940–1951 (2014).
Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S. & Athey, S. Confidence intervals for policy evaluation in adaptive experiments. Proc. Natl Acad. Sci. USA 118, e2014602118 (2021).
Wachter, S., Mittelstadt, B. & Russell, C. Why fairness cannot be automated: bridging the gap between EU non-discrimination law and AI. Comput. Law Security Rev. 41, 105567 (2021).
Hill, S. et al. Network-based marketing: identifying likely adopters via consumer networks. Stat. Sci. 21, 256–276 (2006).
Tobback, E., Bellotti, T., Moeyersoms, J., Stankova, M. & Martens, D. Bankruptcy prediction for SMES using relational data. Decision Support Syst. 102, 69–81 (2017).
Stephens-Davidowitz, S. & Pabon, A. Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are (Harper Collins, 2017).
Robertson, R. E., Olteanu, A., Diaz, F., Shokouhi, M. & Bailey, P. ‘I can’t reply with that’: characterizing problematic email reply suggestions. In Proc. 2021 CHI Conference on Human Factors in Computing Systems Vol. 724, 1–18 (2021).
Praet, S. et al. I Like, Therefore I Am. Predictive Modeling to Gain Insights in Political Preference in a Multi-party System. Research paper 1–34 (University of Antwerp, Faculty of Business and Economics, 2018).
Bapna, R., Ramaprasad, J., Shmueli, G. & Umyarov, A. One-way mirrors in online dating: a randomized field experiment. Manag. Sci. 62, 3100–3122 (2016).
Pentland, A. Social Physics: How Good Ideas Spread—the Lessons from a New Science (Penguin, 2014).
Matz, S. C. & Netzer, O. Using big data as a window into consumers’ psychology. Curr. Opin. Behav. Sci. 18, 7–12 (2017).
King, G. & Persily, N. A new model for industry-academic partnerships. PS Polit. Sci. Polit. 53, 703–709 (2020).
Verbeke, W., Martens, D. & Baesens, B. Social network analysis for customer churn prediction. Appl. Soft Comput. 14, 431–446 (2014).
Kramer, A. D., Guillory, J. E. & Hancock, J. T. Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl Acad. Sci. USA 111, 8788–8790 (2014).
Li, L., Chu, W., Langford, J. & Wang, X. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proc. Fourth ACM International Conference on Web Search and Data Mining 297–306 (ACM, 2011).
Jeunen, O. Revisiting offline evaluation for implicit-feedback recommender systems. In Proc. 13th ACM Conference on Recommender Systems 596–600 (ACM, 2019).
Weller, K. & Kinder-Kurlanda, K. E. A manifesto for data sharing in social media research. In Proc. 8th ACM Conference on Web Science 166–172 (ACM, 2016).
Bastos, M. & Walker, S. T. Facebook’s data lockdown is a disaster for academic researchers. The Conversation Online (11 April 2018); https://theconversation.com/facebooks-data-lockdown-is-a-disaster-for-academic-researchers-94533
Mattu, S., Yin, L., Waller, A. & Keegan, J. How we built a Facebook inspector. The Markup (5 January 2021); https://themarkup.org/citizen-browser/2021/01/05/how-we-built-a-facebook-inspector
Messing, S. et al. Dataverse (Social Science One, 2020); https://socialscience.one/facebook-dataverse
Schnabel, T., Swaminathan, A., Singh, A., Chandak, N. & Joachims, T. Recommendations as treatments: debiasing learning and evaluation. In Proc. International Conference on Machine Learning 1670–1679 (PMLR, 2016).
Lee, D., Hosanagar, K. & Nair, H. S. Advertising content and consumer engagement on social media: evidence from Facebook. Manag. Sci. 64, 5105–5131 (2018).
Verma, S., Dickerson, J. & Hines, K. Counterfactual explanations for machine learning: a review. Preprint at https://arxiv.org/abs/2010.10596 (2020).
Puiutta, E. & Veith, E. M. Explainable reinforcement learning: a survey. In Proc. International Cross-Domain Conference for Machine Learning and Knowledge Extraction 77–95 (Springer, 2020).
Schneider, C., Weinmann, M. & Vom Brocke, J. Digital nudging: guiding online user choices through interface design. Commun. ACM 61, 67–73 (2018).
Lardinois, F. Microsoft finally starts doing something with LinkedIn by integrating it into Office 365. Tech Crunch (25 September 2017); https://techcrunch.com/2017/09/25/microsoft-finally-starts-doing-something-with-linkedin-by-integrating-it-into-office-365/
de Myttenaere, A., Le Grand, B., Golden, B. & Rossi, F. Reducing offline evaluation bias in recommendation systems. In Proc. 23rd Annual Belgian-Dutch Conference on Machine Learning (Benelearn 2014) 55–62 (2014).
Summary Judgment Opinion (ACLU, 2020); https://www.aclu.org/legal-document/summary-judgment-opinion-0
Gorwa, R. What is platform governance? Inf. Commun. Soc. 22, 854–871 (2019).
Gorwa, R., Binns, R. & Katzenbach, C. Algorithmic content moderation: technical and political challenges in the automation of platform governance. Big Data Soc. 7, 2053951719897945 (2020).
McGuigan, L. This tool lets you confuse Google’s ad network, and a test shows it works. MIT Technology Review (6 January 2021); https://www.technologyreview.com/2021/01/06/1015784/adsense-google-surveillance-adnauseam-obfuscation/
Yao, S. et al. Measuring recommender system effects with simulated users. Preprint at https://arxiv.org/abs/2101.04526 (2021).
Tufekci, Z. Big questions for social media big data: representativeness, validity and other methodological pitfalls. In Proc. International AAAI Conference on Web and Social Media Vol. 8 (AAAI, 2014).
Horwitz, J. Facebook seeks shutdown of NYU research project into political ad targeting.Wall Street Journal (23 October 2020); https://www.wsj.com/articles/facebook-seeks-shutdown-of-nyu-research-project-into-political-ad-targeting-11603488533
Activities that Require IRB Review (UCI, accessed 24 February 2022); https://research.uci.edu/compliance/human-research-protections/researchers/activities-irb-review.html
Shmueli, G. & Tafti, A. How to ‘improve’ prediction of human behavior using behavior modification. Preprint at https://arxiv.org/abs/2008.12138 (2020).
Fried, I. Scoop: Google CEO pledges to investigate exit of top AI ethicist. Axios (9 December 2020); https://www.axios.com/sundar-pichai-memo-timnit-gebru-exit-18b0efb0-5bc3-41e6-ac28-2956732ed78b.html
Google fires Margaret Mitchell, another top researcher on its AI ethics team. The Guardian (20 February 2021); https://www.theguardian.com/technology/2021/feb/19/google-fires-margaret-mitchell-ai-ethics-team
Dave, P. & Dastin, J. Google told its scientists to ‘strike a positive tone’ in AI research—documents. Reuters (23 December 2020); https://www.reuters.com/article/us-alphabet-google-research-focus-idUSKBN28X1CB
Kitchin, R. Thinking critically about and researching algorithms. Inf. Commun. Soc. 20, 14–29 (2017).
Boka, Z. Facebook’s research ethics board needs to stay far away from Facebook. Wired Magazine (23 June 2016); https://www.wired.com/2016/06/facebooks-research-ethics-board-needs-stay-far-away-facebook/
Bietti, E. From ethics washing to ethics bashing: a view on tech ethics from within moral philosophy. In Proc. 2020 Conference on Fairness, Accountability and Transparency 210–219 (ACM, 2020).
Li, L., Chu, W., Langford, J. & Schapire, R. E. A contextual-bandit approach to personalized news article recommendation. In Proc. 19th International Conference on World Wide Web 661–670 (2010).
Van Dijck, J., Poell, T. & De Waal, M. The Platform Society: Public Values in a Connective World (Oxford Univ. Press, 2018).
Haugen, F. Statement of Frances Haugen. Whistleblower Aid (4 October 2021); https://www.commerce.senate.gov/services/files/FC8A558E-824E-4914-BEDB-3A7B1190BD49
We thank C. Rudin, F. Provost and T. Evgeniou for their valuable feedback and suggestions.
The authors declare no competing interests.
Peer review information
Nature Machine Intelligence thanks Maytal Saar-Tsechansky and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Greene, T., Martens, D. & Shmueli, G. Barriers to academic data science research in the new realm of algorithmic behaviour modification by digital platforms. Nat Mach Intell 4, 323–330 (2022). https://doi.org/10.1038/s42256-022-00475-7