MSc thesis proposal: Knowledge Distribution and Code Quality After Open-Source Forking at CodeScene AB

Efficient software development is a competitive advantage that allows companies to maintain a short time-to-market. To succeed, companies need to invest in their source code quality to actively counteract technical debt, i.e., the cost of additional rework caused by choosing a quick fix now instead of using a better approach that would take longer. Adding to that challenge, software companies are also faced with a global shortage of software developers. At the same time as the software industry is struggling with recruiting enough talent, research indicates that up to 42% of developers’ time is wasted dealing with technical debt. Adequate management of technical debt is a key success factor for any software company.

Successful software projects depend on how well and effectively knowledge is distributed within the team. As developers are high in demand, many regularly switch employers. When key developers leave, the resulting knowledge loss is often very costly. Like the code quality issues more commonly associated with technical debt, knowledge loss reduces an organization's ability to maintain the alterability of a code base. Organizations need to mitigate the risks by ensuring knowledge distribution among the developers. This supports both off-boarding of developers with substantial source code mastery and on-boarding of new developers eager to learn.

CodeScene

CodeScene is a software engineering intelligence platform that analyses the evolution of software development project to identify patterns in the development and maintenance of a system. The platform offers a wide range of features related to source code quality, social metrics in development teams, and business KPIs. For example, CodeScene uses the identified patterns to suggest refactorings, prioritize technical debt, identify productivity bottlenecks, highlight high-risk pull requests, and detect vulnerable knowledge distribution.

CodeScene is used in production on systems of all scales, including systems with tens of millions lines of code and hundreds of developers. One of CodeScene’s features is to measure knowledge distribution in software projects. Combined with our validated approach to measuring Code Health, we believe our platform can predict the impact of an individual developer leaving a project. However, additional research is needed to fine-tune the predicted long-term costs.

OpenOffice/LibreOffice and MySQL/MariaDB

The OpenOffice project was established in 2000 with initial development by the German company StarDivision. The company was acquired by Sun Microsystems in 1999 which was in turn acquired by Oracle in 2010. However, many OpenOffice community members were concerned by Oracle's behavior towards OSS. As a result, LibreOffice was created as a fork of OpenOffice. Many developers left OpenOffice for LibreOffice, i.e., the knowledge loss was significant. In 2011, Oracle contributed the OpenOffice source code to the Apache Foundation which has maintained Apache OpenOffice since then.

The first version of MySQL was released in 1995 by MySQL AB. The source code became OSS under a GPL licence in 2000. In 2008, Sun Microsystems acquired MySQL AB - and Sun was mentioned before acquired by Oracle in 2010. The year before, some of the leading MySQL developers created MariaDB as a fork. Since 2012, MariaDB Foundation oversees the development of MariaDB. Both MySQL and MariaDB are popular database management systems today.

Assignment

The fork of a large-scale project is not simply a code fork, but also an organizational fork; each of the post-fork teams must continue to manage the entire project without the knowledge and expertise of the other team. This provides researchers with a rare opportunity to chart the impact of knowledge loss in terms of key metrics such as code quality and defects.

In this project, you will use CodeScene to gather rich insights into knowledge distribution and software quality for the OpenOffice project before and after the LibreOffice fork. Run the same analysis for the MySQL/MariaDB fork and compare the results.

We propose the following research questions:

RQ1. How can the knowledge loss resulting from the forks be characterized in terms of scope and duration?

RQ2. How did the knowledge loss from forking impact the code quality of the products?

RQ3. Did the number of bug reports in software modules with substantial knowledge loss increase after the forks?

RQ4. How did the OSS communities mitigate the knowledge loss from forking?

Tasks and learning opportunities

  • Understand the complexity of large-scale open-source software forking.
  • Learn to use a state-of-the-art software engineering intelligence solution.
  • Conduct a quantitative multiple case study on the impact of forking in OSS projects.

Key skills:

  • Essential: software engineering
  • Preferred: statistics, git, open-source software, mining software repositories

References

[1] J. Gamalielsson and B. Lundell. Sustainability of Open Source Software Communities Beyond a Fork: How and Why Has the LibreOffice Project Evolved?, Journal of Systems and Software, 89, pp. 128-145, 2014.

[2] A. Tornhill and M. Borg. Code Red: The Business Impact of Code Quality - A Quantitative Study of 39 Proprietary Production Codebases, In Proc. of the Automatic Bug Fixing. In Proc. of the IEEE/ACM International Conference on Technical Debt (TechDebt), pp. 11-20, 2022.

[3] M. Nassif and M. Robillard. Revisiting Turnover-Induced Knowledge Loss in Software Projects, pp. 11-20, In Proc. of the IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 261-272, 2017.

[4] P. Rigby, Y. Zhu, S. Donadelli and A. Mockus. Quantifying and Mitigating Turnover-Induced Knowledge Loss: Case Studies of Chrome and a Project at Avaya, In Proc. of the IEEE/ACM International Conference on Software Engineering, pp. 1006-1016, 2016.

Practical Details

Expected start time: January-February 2023

Location: CodeScene, Hyllie, Malmö

Contact: Dr. Markus Borg,

Attention: Often you need a pre-approval from your university or study counselor, to ensure that projects or thesis found on SH Karriär will be accepted as part of your education. Please contact the right entity in due time to ensure that you're picking the right project.