As a result of being interested in Stack Overflow data [1] myself, the need arose to track other Stack Overflow-based research.
The following is an attempt to list the academic papers mentioning Stack Overflow/Exchange or using Stack Exchange data. This also includes the works mentioned in the SO blog entry [2] that started the trend, as well as two other questions on meta, one from 2010 [3] and another one from 2011 [4].
If you know of papers that are not listed, please edit the answer directly.
I realised it might be useful to have all the BibTeX entries in a single file, so I started this Github Gist [5].
Kartik Bajaj, Karthik Pattabiraman, and Ali Mesbah. 2014. Mining Questions Asked by Web Developers. In 11th Working Conference on Mining Software Repositories (MSR 2014) ACM [BibTeX] [1] [PDF] [2]
Bogdan Vasilescu, Alexander Serebrenik, Prem Devanbu and Vladimir Filkov. 2014. How social Q&A sites are changing knowledge sharing in open source software communities In 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW 2014) ACM [DOI] [3] [BibTeX] [4] [PDF] [5]
Alexander Halavais, K Hazel Kwon, Shannon Havener and Jason Striker. 2014. Badges of Friendship: Social Influence and Badge Acquisition on Stack Overflow. In 47th Hawaii International International Conference on Systems Science (HICSS-47 2014). IEEE [BibTeX] [6] [PDF] [7]
Megan Squire and Christian Funkhouser. 2014. "A Bit of Code": How the Stack Overflow Community Creates Quality Postings In 47th Hawaii International International Conference on Systems Science (HICSS-47 2014). IEEE [DOI] [8] [PDF] [9]
Yla Tausczik, Aniket Kittur and Robert Kraut. 2014. Collaborative problem solving: A study of Math Overflow In 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW 2014) ACM, pages 355-367. [DOI] [10] [BibTeX] [11] [PDF] [12]
Dana Movshovitz-Attias and William Cohen, 2013. Natural Language Models for Predicting Programming Comments In Proceedings of the Association for Computational Linguistics ACL [BibTeX] [13] [PDF] [14]
Shuo Chang and Aditya Pal, 2013. Routing questions for collaborative answering in community question answering In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining ACM, pages 494-501 [DOI] [15] [BibTeX] [16] [PDF] [17]
Priyanka Singh, Nigel Shadbolt, 2013. Linked data in crowdsourcing purposive social network. In WWW (Companion Volume) International World Wide Web Conferences Steering Committee, pages 913-918 [BibTeX] [18] [PDF] [19]
Edmund Wong, Jinqiu Yang, Lin Tan, 2013. AutoComment: Mining question and answer sites for automatic comment generation. In ASE 2013 IEEE, pages 562-567 [DOI] [20] [BibTeX] [21] [PDF] [22]
Alexandru Lucian Ginsca, Adrian Popescu, 2013. User profiling for answer quality assessment in Q&A communities In Proceedings of the 2013 workshop on Data-driven user behavioral modelling and mining from social media ACM, pages 25-28. [BibTeX] [23] [PDF] [24]
Anders Søgaard, Hector Martinez, Jakob Elming, Anders Johannsen, 2013. Using crowdsourcing to get representations based on regular expressions In 2013 Conference on Empirical Methods in Natural Language Processing. ACL, pages 1476-1480 [BibTeX] [25] [PDF] [26]
Rahul Venkataramani, Atul Gupta, Allahbaksh M. Asadullah, Basavaraju Muddu, Vasudev D. Bhat, 2013. Discovery of technical expertise from open source code repositories In Proceedings of the 22nd international conference on World Wide Web companion International World Wide Web Conferences Steering Committee, pages 97-98 [BibTeX] [27] [PDF] [28]
Joshua Saxe, David Mentis, Chris Greamo, 2013. Mining Web Technical Discussions to Identify Malware Capabilities In IEEE 33rd International Conference on Distributed Computing Systems Workshops. IEEE, pages 1-5 [DOI] [29] [BibTeX] [30] [PDF] [31]
Amiangshu Bosu, Christopher S. Corley, Dustin Heaton, Debarshi Chatterji, Jeffrey C. Carver, Nicholas A. Kraft, 2013. Building reputation in StackOverflow: an empirical investigation In Proceedings of the 10th Working Conference on Mining Software Repositories. IEEE, pages 89-92. [BibTeX] [32] [PDF] [33]
David Kavaler, Daryl Posnett, Clint Gibler, Hao Chen, Premkumar Devanbu, Vladimir Filkov, 2013. Using and Asking: APIs Used in the Android Market and Asked about in StackOverflow In Social Informatics. Springer Verlag, Lecture Notes in Computer Science Volume 8238, 2013, pp 405-418 [DOI] [34] [BibTeX] [35] [PDF] [36]
Mohammad Masudur Rahman, Shamima Yeasmin, Chanchal K. Roy, 2013 An IDE-Based Context-Aware Meta Search Engine In 20th Working Conference on Reverse Engineering. IEEE, pages 467-471. [DOI] [37] [BibTeX] [38] [PDF] [39]
Galina E. Lezina, Artem M. Kuznetsov, 2013. Predict Closed Questions on StackOverflow In Proceedings of the Ninth Spring Researcher’s Colloquium on Database and Information Systems CEUR-WS, Volume 1031, pages 10-14. [PDF] [40]
Dana Movshovitz-Attias, Yair Movshovitz-Attias, Peter Steenkiste and Christos Faloutsos, 2013. Analysis of the reputation system and user contributions on a question answering website: StackOverflow In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining ACM, pages 886-893 [BibTeX] [41] [DOI] [42] [PDF] [43]
Blerina Bazelli, Abram Hindle, Eleni Stroulia, 2013. On the Personality Traits of StackOverflow Users In 2013 IEEE International Conference on Software Maintenance. IEEE, pages 460-463 [BibTeX] [44] [DOI] [45] [PDF] [46]
Clayton Stanley, Michael D. Byrne. 2013. Predicting Tags for StackOverflow Posts In 12th International Conference on Cognitive Modelling. , pages 414-419. [PDF] [47]
Shaowei Wang, David Lo and Lingxiao Jiang. 2013. An Empirical Study on Developer Interactions in StackOverflow. In 28th Annual ACM Symposium on Applied Computing. ACM, pages 1019-1024. [DOI] [48] [BibTeX] [49] [PDF] [50]
Dennis Schenk and Mircea Lungu. 2013. Geo-Locating the Knowledge Transfer in Stack Overflow. In International Workshop on Social Software Engineering. ACM, pages 21-24. [DOI] [51] [BibTeX] [52] [PDF] [53]
Bogdan Vasilescu, Andrea Capiluppi and Alexander Serebrenik. 2013. Gender, representation and online participation: A quantitative study Interacting with Computers. Oxford University Press. 1-24 [DOI] [54] [BibTeX] [55] [PDF] [56] [SlideShare] [57] [Gender tool] [58]
Bogdan Vasilescu, Vladimir Filkov and Alexander Serebrenik. 2013. StackOverflow and GitHub: Associations between software development and crowdsourced knowledge In 2013 ASE/IEEE International Conference on Social Computing. IEEE, pages 188-195. [BibTeX] [59] [PDF] [60] [Slides] [61]
Bogdan Vasilescu, Alexander Serebrenik, Mark G.J. van den Brand. 2013. The Babel of software development: Linguistic diversity in Open Source In 5th International Conference on Social Informatics. Springer LNCS 8238, pages 391-404. [DOI] [62] [BibTeX] [63] [PDF] [64]
Yuan Yao, Hanghang Tong, Tao Xie, Leman Akoglu, Feng Xu and Jian Lu. 2013. Want a Good Answer? Ask a Good Question First! [arXiv] [65] [non-technical overview] [66] [BbTeX] [67]
Denzil Correa and Ashish Sureka. 2013. Fit or Unfit: Analysis and Prediction of 'Closed Questions' on Stack Overflow. In 2013 ACM Conference on Online Social Networks. ACM, pages 201-212. [DOI] [68] [BibTeX] [69] [PDF] [70]
Luca Ponzanelli, Alberto Bacchelli and Michele Lanza. 2013. Seahawk: Stack Overflow in the IDE In 2013 International Conference on Software Engineering. IEEE, pages 1295-1298. [DOI] [71] [BibTeX] [72] [PDF] [73]
Ashton Anderson, Dan Huttenlocher, Jon Kleinberg, and Jure Leskovec. 2013. Steering User Behavior With Badges In 22nd International World Wide Web Conference (WWW’13). ACM, pages 95-106. [DOI] [74] [BibTeX] [75] [PDF] [76]
Miltiadis Allamanis and Charles Sutton. 2013. Why, When and What: Analyzing Stack Overflow Questions by Topic, Type & Code In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 53-56. [DOI] [77] [BibTeX] [78] [PDF] [79]
Muhammad Asaduzzaman, Ahmed Mashiyat, Chanchal Roy and Kevin Schneider. 2013. Answering Questions About Unanswered Questions of Stack Overflow In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 97-100. [DOI] [80] [BibTeX] [81] [PDF] [82]
Amiangshu Bosu, Christopher Corley, Dustin Heaton, Debarshi Chatterji, Jeffrey Carver and Nicholas Kraft. 2013. Building Reputation in StackOverflow: An Empirical Investigation In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 89-92. [DOI] [83] [BibTeX] [84] [PDF] [85]
Joshua Campbell, Chenlei Zhang, Zhen Xu, Abram Hindle and James Miller. 2013. Deficient Documentation Detection: A Methodology to Locate Deficient Project Documentation using Topic Analysis In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 57-60. [DOI] [86] [BibTeX] [87] [PDF] [88] [SlideShare] [89]
Carlos Gomez, Brendan Cleary and Leif Singer. 2013. A Study of Innovation Diffusion Through Link Sharing on Stack Overflow In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 81-84. [DOI] [90] [BibTeX] [91] [PDF] [92]
Scott Grant and Buddy Betts. 2013. Encouraging User Behaviour With Achievements: An Empirical Study In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 65-68. [DOI] [93] [BibTeX] [94] [PDF] [95]
Mario Linares-Vásquez, Bogdan Dit and Denys Poshyvanyk. 2013. An Exploratory Analysis of Mobile Development Issues Using Stack Overflow In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 93-96. [DOI] [96] [BibTeX] [97] [PDF] [98]
Patrick Morrison and Emerson Murphy-Hill. 2013. Is Programming Knowledge Related To Age? - An Exploration of Stack Overflow In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 69-72. [DOI] [99] [BibTeX] [100] [PDF] [101]
Avigit K. Saha, Ripon K. Saha and Kevin A. Schneider. 2013. A Discriminative Model Approach for Suggesting Tags Automatically for Stack Overflow Questions In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 73-76. [DOI] [102] [BibTeX] [103] [PDF] [104]
Vibha Singhal Sinha, Senthil Mani and Monika Gupta. 2013. Exploring Activeness of Users in Q&A Forums In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 77-80. [DOI] [105] [BibTeX] [106]
Siddharth Subramanian and Reid Holmes. 2013. Making Sense of Online Code Snippets In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 85-88. [DOI] [107] [BibTeX] [108] [PDF] [109] [SlideShare] [110] [Demo] [111]
Wei Wang and Michael Godfrey. 2013. Detecting API Usage Obstacles: A Study of iOS and Android Developer Questions In 10th Working Conference on Mining Software Repositories. Mining Challenge. IEEE, pages 61-64. [DOI] [112] [BibTeX] [113] [PDF] [114]
Yla Tausczik and James Pennebaker. 2012. Participation in an online mathematics community: Differentiating motivations to add In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. ACM, pages 207-216. [DOI] [115] [BibTeX] [116] [PDF] [117]
Annie T. T. Ying. 2012. Facilitating Code Example Search on the Web through Expertise Personalization In User Modeling, Adaptation, and Personalization Springer Verlag, Lecture Notes in Computer Science Volume 7379, pp 391-394 [DOI] [118] [BibTeX] [119] [PDF] [120]
Leman Akoglu, Duen Horng Chau, U. Kang, Danai Koutra and Christos Faloutsos. 2012. OPAvion: mining and visualization in large graphs. In 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD'12). ACM, pages 717-720. [DOI] [121] [BibTeX] [122] [PDF] [123]
Ashton Anderson, Daniel Huttenlocher, Jon Kleinberg and Jure Leskovec. 2012. Discovering value from community activity on focused question answering sites: a case study of Stack Overflow. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining [DOI] [124] [BibTeX] [125] [PDF] [126]
Alberto Bacchelli, Luca Ponzanelli and Michele Lanza. 2012. Harnessing Stack Overflow for the IDE. In 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE). IEEE, pages 26-30. [DOI] [127] [BibTeX] [128] [PDF] [129]
Anton Barua, Stephen W. Thomas and Ahmed E. Hassan. 2012. What are developers talking about? An analysis of topics and trends in Stack Overflow. In Empirical Software Engineering [BibTeX] [130] [PDF] [131]
Benjamin V. Hanrahan, Gregorio Convertino, and Les Nelson. 2012. Modeling problem difficulty and expertise in stackoverflow. In ACM 2012 Conference on Computer Supported Cooperative Work Companion (CSCW '12). ACM, pages 91-94. [DOI] [132] [BibTeX] [133]
Hewijin Jiau, and Feng-Pu Yang. 2012. Facing up to the inequality of crowdsourced API documentation. In ACM SIGSOFT Software Engineering Notes. 37 (1): 1-9. [DOI] [134] [BibTeX] [135]
Rafael Lotufo, Leonardo Passos and Krzysztof Czarnecki. 2012. Towards improving bug tracking systems with game mechanisms. In Mining Software Repositories [DOI] [136] [BibTeX] [137] [PDF] [138]
Seyed Mehdi Nasehi, Jonathan Sillito, Frank Maurer and Chris Burns. 2012. What Makes a Good Code Example? A Study of Programming Q&A in StackOverflow. In IEEE International Conference on Software Maintenance (ICSM 2012). IEEE. [DOI] [139] [BibTeX] [140] [PDF] [141]
Aditya Pal, F. Maxwell Harper and Joseph A. Konstan. 2012. Exploring Question Selection Bias to Identify Experts and Potential Experts in Community Question Answering. In ACM Trans. Inf. Syst., vol.30, no.2, pages 10. [DOI] [142] [BibTeX] [143] [PDF] [144]
Aditya Pal, Shuo Chang, and Joseph A. Konstan. 2012. Evolution of Experts in Question Answering Communities. In 6th International AAAI Conference on Weblogs and Social Media. [DOI] [145] [BibTeX] [146] [PDF] [147]
Chris Parnin, Christoph Treude, Lars Grammel, and Margaret-Anne Storey. 2012. Crowd documentation: Exploring the coverage and the dynamics of API discussions on Stack Overflow. Georgia Tech Technical Report GIT-CS-12-05. [PDF] [148]
Daryl Posnett, Eric Warburg, Premkumar Devanbu and Vladimir Filkov. 2012. Mining Stack Exchange: Expertise is Evident From Initial Contributions. In 2012 ASE International Conference on Social Informatics.
Fatemeh Riahi. 2012. Finding expert users in community question answering services using topic models, Master's Thesis, Dalhousie University, Halifax, Nova Scotia. [PDF] [149]
Leif Singer, Fernando Figueira Filho, Brendan Cleary, Christoph Treude, Margaret-Anne Storey, and Kurt Schneider. 2012. Mutual assessment in the social programmer ecosystem: an empirical investigation of developer profile aggregators. Technical report, University of Hannover, Germany. [PDF] [150]
Christoph Treude, Fernando Figueira Filho, Brendan Cleary, and Margaret-Anne Storey. 2012. Programming in a socially networked world: the evolution of the social programmer. In Future of Collaborative Software Development Workshop. [PDF] [151]
Bogdan Vasilescu, Andrea Capiluppi and Alexander Serebrenik. 2012. Gender, representation and online participation: A quantitative study of StackOverflow. In 2012 ASE/IEEE International Conference on Social Informatics. IEEE, pages 332-338. [DOI] [152] [BibTeX] [153] [PDF] [154]
Jakob Voß. 2012. Linking Folksonomies to Knowledge Organization Systems. In 6th Conference on Metadata and Semantics Research (MTSR '12). Springer, pages 89-97. [DOI] [155] [HTML] [156] [PDF] [157] [ Slides [158]]
Yla Tausczik and James Pennebaker. 2011. Predicting the Perceived Quality of Online Mathematics Contributions from Users' Reputations In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, pages 1885--1888. [DOI] [1] [BibTeX] [2] [PDF] [3]
Zainab Zolaktaf, Fatemeh Riahi, Mahdi Shafiei, Evangelos Milios, 2011. Modeling Community Question Answering archives. In Workshop on Computational Social Science and the Wisdom of Crowds, Neural Information Processing Systems, December 17, 2011, Sierra Nevada, Spain. [BibTeX] [4] [PDF] [5]
Lena Mamykina, Bella Manoim, Manas Mittal, George Hripcsak, and Björn Hartmann. 2011. Design lessons from the fastest q&a site in the west. In 2011 Annual Conference on Human Factors in Computing Systems (CHI '11). ACM, pages 2857-2866. [DOI] [6] [BibTeX] [7] [PDF] [8]
Toby Osbourn. 2011. Getting the most out of the web. IEEE Software, vol.28, no.1, pages 96. [DOI] [9] [BibTeX] [10]
Aditya Pal, Rosta Farzan, Joseph A. Konstan and Robert E. Kraut. 2011. Early detection of potential experts in question answering communities. In 19th International Conference on User Modeling, Adaption and Personalization (UMAP'11). Springer, pages 231-242. [DOI] [11] [BibTeX] [12] [PDF] [13]
Chris Parnin, and Christoph Treude. Measuring API documentation on the web. 2011. In 2nd International Workshop on Web 2.0 for Software Engineering (Web2SE’11), ACM, pages 25–30. [DOI] [14] [BibTeX] [15] [PDF] [16]
Nidhi Raj, Lipika Dey, and Bhakti Gaonkar. 2011. Expertise prediction for social network platforms to encourage knowledge sharing. In 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Vol. 1. IEEE, pages 380-383. [DOI] [17] [BibTeX] [18]
Daniel Schall and Florian Skopik: An analysis of the structure and dynamics of large-scale Q/A communities. ADBIS'11 15th International Conference on Advances in Databases and Information Systems. [DOI] [19] [BibTeX] [20] [PDF] [21]
Christoph Treude, Ohad Barzilay, and Margaret-Anne D. Storey. 2011. How do programmers ask and answer questions on the web?: NIER track. In 33rd International Conference on Software Engineering (ICSE'11), IEEE, pages 804-807. [DOI] [22] [BibTeX] [23] [PDF] [24]
Zainab Zolaktaf, Fatemeh Riahi, Mahdi Shafiei, and Evangelos Milios. Modeling community question-answering archives. Technical report, Faculty of Computer Science, Dalhousie University, Canada. [PDF] [25]
Barthélémy Dagenais and Martin P. Robillard. 2010, Creating and evolving developer documentation: understanding the decisions of open source contributors In 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE '10), ACM, pages 127-136. [DOI] [26] [BibTeX] [27] [PDF] [28]
Ravi Kumar, Yury Lifshits, and Andrew Tomkins. 2010. Evolution of two-sided markets. In 3rd ACM International Conference on Web Search and Data Mining (WSDM '10). ACM, pages 311-320. [DOI] [29] [BibTeX] [30] [PDF] [31] [Slides] [32]
Hüseyin Oktay, Brian J. Taylor, and David D. Jensen. 2010. Causal discovery in social media using quasi-experimental designs. In First Workshop on Social Media Analytics (SOMA '10). ACM, pages 1-9. [DOI] [33] [BibTeX] [34] [PDF] [35]
Matthew J.H. Rattigan and David Jensen. 2010. Leveraging D-Separation for Relational Data Sets. In 2010 IEEE International Conference on Data Mining (ICDM '10). IEEE, pages 989-994. [DOI] [36] [BibTeX] [37] [PDF] [38]