Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Barriers to academic data science research in the new realm of algorithmic behaviour modification by digital platforms

Abstract

The era of behavioural big data has created new avenues for data science research, with many new contributions stemming from academic researchers. Yet data controlled by platforms have become increasingly difficult for academics to access. Platforms now routinely use algorithmic behaviour modification techniques to manipulate users’ behaviour, leaving academic researchers further isolated in conducting important data science and computational social science research. This isolation results from researchers’ lack of access to human behavioural data and, crucially, to both the data on machine behaviour that triggers and learns from the human data and the platform’s behaviour modification mechanisms. Given the impact of behaviour modification on individual and societal well-being, we discuss the consequences for data science knowledge creation, and encourage academic data scientists to take on new roles in producing research to promote (1) platform transparency and (2) informed public debate around the social purpose and function of digital platforms.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Human users’ behavioural data and related machine data used for BMOD and prediction.

References

  1. Shmueli, G. Research dilemmas with behavioral big data. Big Data 5, 98–119 (2017).

    Article  Google Scholar 

  2. Olteanu, A., Castillo, C., Diaz, F. & Kıcıman, E. Social data: biases, methodological pitfalls and ethical boundaries. Front. Big Data 2, 13 (2019).

    Article  Google Scholar 

  3. Wu, A. X. & Taneja, H. Platform enclosure of human behavior and its measurement: using behavioral trace data against platform episteme. New Media Soc. 23, 2650–2667 (2020).

    Article  Google Scholar 

  4. Lazer, D. M. et al. Computational social science: obstacles and opportunities. Science 369, 1060–1062 (2020).

    Article  Google Scholar 

  5. Sadowski, J., Viljoen, S. & Whittaker, M. Everyone should decide how their digital data are used—not just tech companies. Nature 595, 169–171 (2021).

    Article  Google Scholar 

  6. Rahwan, I. et al. Machine behaviour. Nature 568, 477–486 (2019).

    Article  Google Scholar 

  7. Bak-Coleman, J. B. et al. Stewardship of global collective behavior. Proc. Natl Acad. Sci. USA 118, e2025764118 (2021).

    Article  Google Scholar 

  8. Srnicek, N. Platform Capitalism (Wiley, 2017).

  9. Helmond, A. The platformization of the web: making web data platform ready. Social Media Soc. 1, 1–11 (2015).

    Google Scholar 

  10. Zuboff, S. The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power (Profile Books, 2019).

  11. Gauci, J. et al. Horizon: Facebook’s open source applied reinforcement learning platform. Preprint at https://arxiv.org/abs/1811.00260 (2018).

  12. De Cnudde, S. et al. What does your facebook profile reveal about your creditworthiness? Using alternative data for microfinance. J. Oper. Res. Soc. 70, 353–363 (2019).

    Article  Google Scholar 

  13. Kosinski, M., Stillwell, D. & Graepel, T. Private traits and attributes are predictable from digital records of human behavior. Proc. Natl Acad. Sci USA 110, 5802–5805 (2013).

    Article  Google Scholar 

  14. Matz, S. C., Kosinski, M., Nave, G. & Stillwell, D. J. Psychological targeting as an effective approach to digital mass persuasion. Proc. Natl Acad. Sci. USA 114, 12714–12719 (2017).

    Article  Google Scholar 

  15. Gauci, J., Liu, H., Ghavamzadeh, M. & Nahmias, R. Open-sourcing Reagent, a Modular, End-to-end Platform for Building Reasoning Systems https://ai.facebook.com/blog/open-sourcing-reagent-a-platform-for-reasoning-systems/ (2019);

  16. Michie, S. et al. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Ann. Behav. Med. 46, 81–95 (2013).

    Article  Google Scholar 

  17. Milano, S., Mittelstadt, B., Wachter, S. & Russell, C. Epistemic fragmentation poses a threat to the governance of online targeting. Nat. Mach. Intell. 3, 466–472 (2021).

    Article  Google Scholar 

  18. Fogg, B. J. Persuasive Technology: Using Computers to Change What We Think and Do (Morgan Kaufmann, 2002).

  19. Yeung, K. ‘hypernudge’: big data as a mode of regulation by design. Inf. Commun. Soc. 20, 118–136 (2017).

    Article  Google Scholar 

  20. Kaptein, M., Markopoulos, P., De Ruyter, B. & Aarts, E. Personalizing persuasive technologies: explicit and implicit personalization using persuasion profiles. Int. J. Human Comput. Stud. 77, 38–51 (2015).

    Article  Google Scholar 

  21. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2018).

  22. Chen, M. et al. Top-K off-policy correction for a reinforce recommender system. In Proc. Twelfth ACM International Conference on Web Search and Data Mining 456–464 (ACM, 2019).

  23. Eyal, N. Hooked: How to Build Habit-Forming Products (Penguin, 2014).

  24. Bird, S., Barocas, S., Crawford, K., Diaz, F. & Wallach, H. Exploring or exploiting? Social and ethical implications of autonomous experimentation in AI. In Workshop on Fairness, Accountability and Transparency in Machine Learning (2016); https://ssrn.com/abstract=2846909

  25. Burr, C., Cristianini, N. & Ladyman, J. An analysis of the interaction between intelligent software agents and human users. Minds Mach. 28, 735–774 (2018).

    Article  Google Scholar 

  26. Russell, S. Human Compatible: Artificial Intelligence and the Problem of Control (Penguin, 2019).

  27. Cristianini, N., Scantamburlo, T. & Ladyman, J. The social turn of artificial intelligence. AI Soc. https://doi.org/10.1007/s00146-021-01289-8 (2021).

  28. Milano, S., Taddeo, M. & Floridi, L. Recommender systems and their ethical challenges. AI Soc. 35, 957–967 (2020).

    Article  Google Scholar 

  29. Menczer, F. 4 reasons why social media make us vulnerable to manipulation. In Proc. Fourteenth ACM Conference on Recommender Systems 1 (ACM, 2020); https://doi.org/10.1145/3383313.3418434

  30. Beam, M. A., Hutchens, M. J. & Hmielowski, J. D. Facebook news and (de) polarization: reinforcing spirals in the 2016 US election. Inf. Commun. Soc. 21, 940–958 (2018).

    Article  Google Scholar 

  31. Bidar, M. Liberals to ‘Moscow Mitch,’ conservatives to QAnon: Facebook researchers saw how its algorithms led to misinformation. CBS News Online (25 October 2021); https://www.cbsnews.com/news/facebook-algorithm-news-feed-conservatives-liberals-india/

  32. Saar-Tsechansky, M., Melville, P. & Provost, F. Active feature-value acquisition. Manag. Sci. 55, 664–684 (2009).

    Article  Google Scholar 

  33. Saar-Tsechansky, M. & Provost, F. Handling missing values when applying classification models. J. Mach. Learn. Res. 8, 1623–1657 (2007).

    MATH  Google Scholar 

  34. Yahav, I., Shmueli, G. & Mani, D. A tree-based approach for addressing self-selection in impact studies with big data. MIS Q. 40, 819–848 (2016).

    Article  Google Scholar 

  35. Athey, S. & Imbens, G. Recursive partitioning for heterogeneous causal effects. Proc. Natl Acad. Sci. USA 113, 7353–7360 (2016).

    MathSciNet  Article  Google Scholar 

  36. Martens, D., Provost, F., Clark, J. & de Fortuny, E. J. Mining massive fine-grained behavior data to improve predictive analytics. MIS Q. 40, 869–888 (2016).

    Article  Google Scholar 

  37. Ramon, Y., Martens, D., Provost, F. & Evgeniou, T. A comparison of instance-level counterfactual explanation algorithms for behavioral and textual data: SEDC, LIME-C and SHAP-C. Adv. Data Anal. Classif 14, 801–819 (2020).

    MathSciNet  Article  Google Scholar 

  38. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  Google Scholar 

  39. Walker, D. & Muchnik, L. Design of randomized experiments in networks. Proc. IEEE 102, 1940–1951 (2014).

    Article  Google Scholar 

  40. Hadad, V., Hirshberg, D. A., Zhan, R., Wager, S. & Athey, S. Confidence intervals for policy evaluation in adaptive experiments. Proc. Natl Acad. Sci. USA 118, e2014602118 (2021).

    MathSciNet  Article  Google Scholar 

  41. Wachter, S., Mittelstadt, B. & Russell, C. Why fairness cannot be automated: bridging the gap between EU non-discrimination law and AI. Comput. Law Security Rev. 41, 105567 (2021).

    Article  Google Scholar 

  42. Hill, S. et al. Network-based marketing: identifying likely adopters via consumer networks. Stat. Sci. 21, 256–276 (2006).

    MathSciNet  Article  Google Scholar 

  43. Tobback, E., Bellotti, T., Moeyersoms, J., Stankova, M. & Martens, D. Bankruptcy prediction for SMES using relational data. Decision Support Syst. 102, 69–81 (2017).

    Article  Google Scholar 

  44. Stephens-Davidowitz, S. & Pabon, A. Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are (Harper Collins, 2017).

  45. Robertson, R. E., Olteanu, A., Diaz, F., Shokouhi, M. & Bailey, P. ‘I can’t reply with that’: characterizing problematic email reply suggestions. In Proc. 2021 CHI Conference on Human Factors in Computing Systems Vol. 724, 1–18 (2021).

  46. Praet, S. et al. I Like, Therefore I Am. Predictive Modeling to Gain Insights in Political Preference in a Multi-party System. Research paper 1–34 (University of Antwerp, Faculty of Business and Economics, 2018).

  47. Bapna, R., Ramaprasad, J., Shmueli, G. & Umyarov, A. One-way mirrors in online dating: a randomized field experiment. Manag. Sci. 62, 3100–3122 (2016).

    Article  Google Scholar 

  48. Pentland, A. Social Physics: How Good Ideas Spread—the Lessons from a New Science (Penguin, 2014).

  49. Matz, S. C. & Netzer, O. Using big data as a window into consumers’ psychology. Curr. Opin. Behav. Sci. 18, 7–12 (2017).

    Google Scholar 

  50. King, G. & Persily, N. A new model for industry-academic partnerships. PS Polit. Sci. Polit. 53, 703–709 (2020).

    Article  Google Scholar 

  51. Verbeke, W., Martens, D. & Baesens, B. Social network analysis for customer churn prediction. Appl. Soft Comput. 14, 431–446 (2014).

    Article  Google Scholar 

  52. Kramer, A. D., Guillory, J. E. & Hancock, J. T. Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl Acad. Sci. USA 111, 8788–8790 (2014).

    Article  Google Scholar 

  53. Li, L., Chu, W., Langford, J. & Wang, X. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proc. Fourth ACM International Conference on Web Search and Data Mining 297–306 (ACM, 2011).

  54. Jeunen, O. Revisiting offline evaluation for implicit-feedback recommender systems. In Proc. 13th ACM Conference on Recommender Systems 596–600 (ACM, 2019).

  55. Weller, K. & Kinder-Kurlanda, K. E. A manifesto for data sharing in social media research. In Proc. 8th ACM Conference on Web Science 166–172 (ACM, 2016).

  56. Bastos, M. & Walker, S. T. Facebook’s data lockdown is a disaster for academic researchers. The Conversation Online (11 April 2018); https://theconversation.com/facebooks-data-lockdown-is-a-disaster-for-academic-researchers-94533

  57. Mattu, S., Yin, L., Waller, A. & Keegan, J. How we built a Facebook inspector. The Markup (5 January 2021); https://themarkup.org/citizen-browser/2021/01/05/how-we-built-a-facebook-inspector

  58. Messing, S. et al. Dataverse (Social Science One, 2020); https://socialscience.one/facebook-dataverse

  59. Schnabel, T., Swaminathan, A., Singh, A., Chandak, N. & Joachims, T. Recommendations as treatments: debiasing learning and evaluation. In Proc. International Conference on Machine Learning 1670–1679 (PMLR, 2016).

  60. Lee, D., Hosanagar, K. & Nair, H. S. Advertising content and consumer engagement on social media: evidence from Facebook. Manag. Sci. 64, 5105–5131 (2018).

    Article  Google Scholar 

  61. Verma, S., Dickerson, J. & Hines, K. Counterfactual explanations for machine learning: a review. Preprint at https://arxiv.org/abs/2010.10596 (2020).

  62. Puiutta, E. & Veith, E. M. Explainable reinforcement learning: a survey. In Proc. International Cross-Domain Conference for Machine Learning and Knowledge Extraction 77–95 (Springer, 2020).

  63. Schneider, C., Weinmann, M. & Vom Brocke, J. Digital nudging: guiding online user choices through interface design. Commun. ACM 61, 67–73 (2018).

    Article  Google Scholar 

  64. Lardinois, F. Microsoft finally starts doing something with LinkedIn by integrating it into Office 365. Tech Crunch (25 September 2017); https://techcrunch.com/2017/09/25/microsoft-finally-starts-doing-something-with-linkedin-by-integrating-it-into-office-365/

  65. de Myttenaere, A., Le Grand, B., Golden, B. & Rossi, F. Reducing offline evaluation bias in recommendation systems. In Proc. 23rd Annual Belgian-Dutch Conference on Machine Learning (Benelearn 2014) 55–62 (2014).

  66. Summary Judgment Opinion (ACLU, 2020); https://www.aclu.org/legal-document/summary-judgment-opinion-0

  67. Gorwa, R. What is platform governance? Inf. Commun. Soc. 22, 854–871 (2019).

    Article  Google Scholar 

  68. Gorwa, R., Binns, R. & Katzenbach, C. Algorithmic content moderation: technical and political challenges in the automation of platform governance. Big Data Soc. 7, 2053951719897945 (2020).

    Article  Google Scholar 

  69. McGuigan, L. This tool lets you confuse Google’s ad network, and a test shows it works. MIT Technology Review (6 January 2021); https://www.technologyreview.com/2021/01/06/1015784/adsense-google-surveillance-adnauseam-obfuscation/

  70. Yao, S. et al. Measuring recommender system effects with simulated users. Preprint at https://arxiv.org/abs/2101.04526 (2021).

  71. Tufekci, Z. Big questions for social media big data: representativeness, validity and other methodological pitfalls. In Proc. International AAAI Conference on Web and Social Media Vol. 8 (AAAI, 2014).

  72. Horwitz, J. Facebook seeks shutdown of NYU research project into political ad targeting.Wall Street Journal (23 October 2020); https://www.wsj.com/articles/facebook-seeks-shutdown-of-nyu-research-project-into-political-ad-targeting-11603488533

  73. Activities that Require IRB Review (UCI, accessed 24 February 2022); https://research.uci.edu/compliance/human-research-protections/researchers/activities-irb-review.html

  74. Shmueli, G. & Tafti, A. How to ‘improve’ prediction of human behavior using behavior modification. Preprint at https://arxiv.org/abs/2008.12138 (2020).

  75. Fried, I. Scoop: Google CEO pledges to investigate exit of top AI ethicist. Axios (9 December 2020); https://www.axios.com/sundar-pichai-memo-timnit-gebru-exit-18b0efb0-5bc3-41e6-ac28-2956732ed78b.html

  76. Google fires Margaret Mitchell, another top researcher on its AI ethics team. The Guardian (20 February 2021); https://www.theguardian.com/technology/2021/feb/19/google-fires-margaret-mitchell-ai-ethics-team

  77. Dave, P. & Dastin, J. Google told its scientists to ‘strike a positive tone’ in AI research—documents. Reuters (23 December 2020); https://www.reuters.com/article/us-alphabet-google-research-focus-idUSKBN28X1CB

  78. Kitchin, R. Thinking critically about and researching algorithms. Inf. Commun. Soc. 20, 14–29 (2017).

    Article  Google Scholar 

  79. Boka, Z. Facebook’s research ethics board needs to stay far away from Facebook. Wired Magazine (23 June 2016); https://www.wired.com/2016/06/facebooks-research-ethics-board-needs-stay-far-away-facebook/

  80. Bietti, E. From ethics washing to ethics bashing: a view on tech ethics from within moral philosophy. In Proc. 2020 Conference on Fairness, Accountability and Transparency 210–219 (ACM, 2020).

  81. Li, L., Chu, W., Langford, J. & Schapire, R. E. A contextual-bandit approach to personalized news article recommendation. In Proc. 19th International Conference on World Wide Web 661–670 (2010).

  82. Van Dijck, J., Poell, T. & De Waal, M. The Platform Society: Public Values in a Connective World (Oxford Univ. Press, 2018).

  83. Haugen, F. Statement of Frances Haugen. Whistleblower Aid (4 October 2021); https://www.commerce.senate.gov/services/files/FC8A558E-824E-4914-BEDB-3A7B1190BD49

Download references

Acknowledgements

We thank C. Rudin, F. Provost and T. Evgeniou for their valuable feedback and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Galit Shmueli.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Maytal Saar-Tsechansky and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Greene, T., Martens, D. & Shmueli, G. Barriers to academic data science research in the new realm of algorithmic behaviour modification by digital platforms. Nat Mach Intell 4, 323–330 (2022). https://doi.org/10.1038/s42256-022-00475-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-022-00475-7

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing