07-code-review.Rmd

# Code Review

## Motivation
Code review is a manual assessment process of proposed code changes by other developers than the
author, in order to improve code quality and reduce the amount of software defects. As a common
software engineering practice, code review is applied in industry and many open-source projects.

Concerning research, code review has become a popular topic recently according to the number of
published papers. As noted in [@bacchelli2013expectations], the concept of modern code review was
proposed in 2013. Since then, many researchers not only explored the modern code review process and
its impact on software quality, but also pointed out the possible methods to improve the
application of code review.

We collected and read relevant papers, and will present an overview of current research progress
in code review in this survey chapter.

## Research protocol
This section describes the review protocol used for the systematic review presented in this
chapter. The protocol has been set up using Kitchenham's method as described by @kitchenham2007.

### Research questions
The goal of the review is to summarize the state of the art and identify future challenges in the
code review area. The research questions are as follows:

* **RQ1**: *What is the state of the art in the research area of code review?* This question
focusses on topics that are researched often, the results of that research, and research methods,
tools and datasets that are used.
* **RQ2**: *What is the current state of practice in the area of code review?* This concerns tools
and techniques that are developed and used in practice, by open source projects but also by
commercial companies.
* **RQ3**: *What are future challenges in the area of code review?* This concerns both research
challenges and challenges for use in practice.

### Search process
The search process consists of the following:

* A Google Scholar search using the search query *"modern code review" OR "modern code reviews"*.
The results list will be sorted by decreasing relevance by Google Scholar and will be considered by
us in order.
* A general Google search for non-scientific reports (e.g., blog posts) and implemented code review
tools. For this search queries *code review* and *code review tools* are used, respectively. The
result list will be considered in order.
* All papers in the initial seed provided by the course instructor will be considered.
* All papers referenced by already collected papers will be considered. We exlude papers found
using this rule of the search process. In other words, we do not apply this rule recursively.

From now on, elements of all four categories listed above in general will be called *resource*.

### Inclusion criteria
From the scientific literature, the following types of papers will be considered:

Papers researching recent code review

* concepts,
* methodologies,
* tools and platforms,
* and experiments concerning the preceding.

From non-scientific resources, all resources discussing recent tools and techniques used in
practice will be considered.

### Exclusion criteria
Resources published before 2008 will be excluded from the study, in order for the survey to show
only the state of the art of the field.

### Primary study selection process
We will select a number of candidate resources based on the criteria stated above. For each
resource, each person participating in the review can select it as a candidate.

From all candidates, resource will be selected that will actually be reviewed. This can also be
done by each person participating in the review. All resources that are candidates but are not
selected for actual review must be explicitly rejected, with accompanying reasoning, by at least
two persons participating in the review.

### Data collection
The following data will be collected from each considered resource:

* Source (for example, the blog website or specific journal)
* Year published
* Type of resource
* Author(s) and organization(s)
* Summary of the resource of a maximum of 100 words
* Data for answering **RQ1**:
    - Sub-topic of research
    - Research method
    - Tools used for research
    - Datasets used for research
    - Research questions and their answers
* Data for answering **RQ2**:
    - Tools used
    - Company/organization using the tool
    - Evaluation of the tool
* Data for answering **RQ3**:
    - Future research challenges posed
    - Challenges for use in practice

All data will be collected by one person participating in the review and checked by another.

## Candidate resources
In the appendix, all candidates that are collected using the described search process are
presented. The `In survey` column in the tables indicates whether the paper has been
included in the survey in the end or if it has been excluded for some reason. If it has been
excluded, the reason will be stated in the section *Excluded papers*.

### Initial seed
This table lists all initial seed papers provided by the course instructor that conform
to the stated criteria. They are listed in alphabetical order of the first author's name, and then
by publish year in Table 1 in the appendix.

### Google Scholar
This table lists all candidates that have been collected through the Google Scholar search
described in the search process. They are listed in alphabetical order of the first author's name,
and then by publish year. Note that as described in the search process section, papers in the
search are considered in order of search result number. The *Search date* and *Result number*
columns indicate the date on which the search was executed and the position in the search result
list, respectively. The table can be found in Table 2 in appendix.

### By reference
We list all candidates that have been found by being referenced by another paper we found in Table
3 in appendix. 

## Answers
### RQ1
**RQ1** is *What is the state of the art in the research area of code review?*. As stated in the
introduction section, this question concerns itself with topics that are researched often, the
results of that research, and research methods, tools and datasets that are used. Each of these
topics will be discussed in the answer to this question.

#### Research methods
First, let us consider the research methods that are generally being used by the papers we
incorporated in the survey. The majority of the papers we considered do quantitative research
[@baysal2016investigating; @beller2014modern; @czerwonka2015code; @gousios2014exploratory;
@mcintosh2014impact; @baysal2013influence; @shimagaki2016study; @thongtanunam2015should,
@thongtanunam2016revisiting; @thongtanunam2017review; @zanjani2016automatically] and some
qualitative research has also been done [@bacchelli2013expectations; @bird2015lessons;
@gousios2014exploratory; @mcintosh2014impact; @shimagaki2016study; @thongtanunam2014reda].
The quantitative research mainly concerns itself with research on open-source projects, while the
qualitative research often also considers closed source projects. This is probably the case because
the development history, and hence also the code history, is far easier to access for open-source
projects than for closed-source projects. The qualitative research mainly concerns itself with
interviews, mostly with developers. This is probably the case because it is more convenient to
reach developers of proprietary projects, for example because they are often all in one place.

All research that is considered in this survey was done empirically. In other words, no explicit
experimental setups have been created just for the purpose of doing the research, but all research
has been done on existing situations.

#### Tools and datasets
All papers that do quantitative research use some tools for processing the data. None of the papers
in this survey use pre-built tools. All of them have created some custom tools to process the data
[@baysal2016investigating; @beller2014modern; @czerwonka2015code; @gousios2014exploratory;
@baysal2013influence; @shimagaki2016study; @thongtanunam2015should; @thongtanunam2016revisiting;
@thongtanunam2017review; @zanjani2016automatically]. This mostly concerns custom code in the R
programming language that is created for this specific use case only. Some of the papers make the
source code of the tools they use available online, so it can be used by other people. The paper
by Gousios et al. [@gousios2014exploratory] does this, for example.

As for datasets that are used: there are a few open-source projects that are used very often for
quantitative research, notably the following projects: Qt [@mcintosh2014impact;
@thongtanunam2015should; @thongtanunam2016revisiting; @thongtanunam2017review; @yang2016mining],
Android [@thongtanunam2014reda; @thongtanunam2015should; @thongtanunam2017review; @yang2016mining;
@zanjani2016automatically], OpenStack [@thongtanunam2015should; @thongtanunam2016revisiting;
@thongtanunam2017review; @yang2016mining], LibreOffice [@thongtanunam2015should; @yang2016mining],
Eclipse [@yang2016mining; @zanjani2016automatically]. Apart from these, mostly other open-source
projects have been used, along with a few closed-source projects, for example from Sony Mobile
[@shimagaki2016study] or from Microsoft [@zanjani2016automatically]. What is used from these
projects is often only the data from the code reviewing system, possibly along with the code that
is under review. As stated above, the reason that open-source projects are used much more as
datasets is probably that they are much easier to access.

On another note, the paper by Yang et al. [@yang2016mining] introduces a well structured new
dataset for the purpose of performing research and as a benchmark dataset on which code review
tools can be tested to compare them. It aims to create a more clear research environment that way.

#### Research subjects
The surveyed papers broadly consider four research subjects, namely factors that the code review
process influences, factors that influence the code review process, general characteristics of the
code review process, and the performance of tools assisting the code review process. These subjects
will be discussed below.

**Factors that the code review process influences**: Bacchelli and Bird [@bacchelli2013expectations]
found that code improvement is the most prevalent result of code reviews, followed by code
understanding among the development team and social communication within the development team. They
note that finding errors is not a prevalent result of doing code reviews, as opposed to what most
people participating in it expect from it. In numbers they find that 14% of code review comments
were about finding defects, while as much as 44% of the developers inidcates finding defects as the
main motivation for doing code reviews.

A bit to the contrary, McIntosh et al. [@mcintosh2014impact] find that low participation in code
reviews does lead to a significant increase in post-release defects, which suggests that reviews in
which developers show much activity actually help in finding defects. They additionally find that
review coverage, the share of code that has been reviewed, also influences the amount of
post-release defects, though not as much as review participation. Shimagaki et al.
[@shimagaki2016study] performed a replication at Sony Mobile study of the just mentioned study by
McIntosh et al. Their results were the same for review coverage, and partly for review
participation. Contrary to McIntosh et al., they found that the reviewing time and discussion
length metrics for review participation did not contribute significantly to the amount of
post-release defects.

Thongtanunam et al. [@thongtanunam2016revisiting] back up the claim that doing code reviews helps
preventing defects in software, by stating that using code review activity can help to identify
defect-prone software modules. They also state that developers who do not often author code
changes in the relevant part of code, but still review much can deliver good code reviews.
Only when the developer does not author much and does not review much, the code quality can
decrease significantly.

**Factors that influence the code review process**: To start, Baysal et al.
[@baysal2016investigating] found that mainly the experience of the writer of a patch influences
the outcome (i.e., accepted or not) of the review. Gousios et al. [@gousios2014exploratory] do not
fully agree with this in the context of GitHub pull-requests. They found that only 13% of
pull-requests are rejected due to technical reasons, and as much as 53% due to aspects of the
distributed nature of pull requests. Thongtanunam et al. [@thongtanunam2017review] add to this that
low number of reviewers for prior patches of a patch submitter and a large time since the last
modification of the files being modified by the patch, which is also agreed upon by Gousios et al.
in the context of pull-requests [@gousios2014exploratory], make it difficult to find reviewers for
a patch. Although this does not mean it gets closed immediately, the effect may be the same in the
long run. Contrary to what one would expect, they also find that the presence of test code in the
patch does not influence the decision to merge it.

Related to this, some submitted patches may simply take a long time to be reviewed. Baysal et al.
[@baysal2016investigating] attribute this to the patch size, which component the patch is for,
organizational affiliation of the patch writer, the experience of the patch writer, and the
amount of past activity of the reviewer. Bacchelli and Bird [@bacchelli2013expectations] note about
the last point that understanding the code that is to be reviewed, by the reviewer, is an important
challenge. Gousios et al. [@gousios2014exploratory] add to this that the size of the project, its
test coverage, and the projects track record on accepting external contributions are also relevant.
Thongtanunam et al. [@thongtanunam2015should] add that not being able to find a reviewer for a
patch can significantly increase the time required to merge a patch, with an average of 12 days
longer. In their research 4%-30% of reviews had this problem, depending on the project.
Thongtanunam et al. [@thongtanunam2017review] also note that if a previous path of a submitter took
long to review, a new patch is very likely to have the same problem. They also point out that a
patch takes longer to merge if the purpose of a patch is to introduce new features. According to
Gousios et al. [@gousios2014exploratory], most patches are merged or rejected within one day.

Thongtanunam et al. [@thongtanunam2017review] also found that short patch descriptions, a small
code churn, and a small discussion length decrease the chance that a patch will be discussed.
Czerwonka, Greiler, and Tilford [@czerwonka2015code] add to this that when the number of changed
files gets above 20, the amount of useful feedback gets lower.

**Characteristics of the code review process**: Beller et al. [@beller2014modern] found that 75% of
the changes in code under review are related to evolvability of the system, and only 25% to its
functionality. They also note that the amount of code churn, the number of changed files, and task
type determine the number of changes that is done when a patch is under review. According to
Gousios et al. [@gousios2014exploratory], most patches are not that big, most being less than 20
lines long (in the context of pull-requests). They also note that discussions are only 3 comments
long on average. Beller et al. [@beller2014modern] note about this that 78-90% of the changes that
are done during review are because of those comments. The source of the rest of the changes is not
known by them.

Another interesting point to note is that only 14% of the repositories on GitHub are actually using
pull-requests on GitHub [@gousios2014exploratory]. This may not be readily generalizable to the
amount of changes that is being code reviewed in all projects, but is a quite low number
nevertheless. Thongtanunam et al. [@thongtanunam2016revisiting] add to this that most developers
that only do reviews are core developers, from which one could infer that most patch submissions
(and also PRs) would come from external contributors. This together leads one to think that
projects are not yet that open to external contributions.

**Performance of tools assisting in the code review process**: Two tools are proposed in the papers
that have been surveyed: *RevFinder* [@thongtanunam2015should] and *cHRev*
[@zanjani2016automatically]. Both tools aim to automatically recommend reviewers to a patch
submission, in order to make patch processing faster. RevFinder works by looking at the paths of
files that reviewers reviewed previously. It recommends a reviewer whose file path review history
looks the most like that of the current patch submission. It uses several string comparison
techniques for this. cHRev improves on this by considering how often a reviewer has reviewed
changes for a certain component, and also how recently. It has much better accuracy than RevFinder.
RevFinder recommends correct reviewers with a median rank of 4 (i.e., a good reviewer candidate
is on position 4 on average) based on empirical evaluation. For cHRev, less than 2 recommendations
are necessary to find a good reviewer candidate.

### RQ2
When it comes to application of code review in industry, we collect information from three
perspectives, namely popularity, variety and choices of tools. From collecting information from
papers, we know that around one fourth of researched companies regard the code review process as a
regular process and about 60 percent of respondents are implementing tool-based code review based
on analysis from different companies who are selling code review tools in [@baum2017choice]. Most
of the teams use one specialized review tool. One third of the teams choose generic software
development tools, like ticket systems and version control systems. Some development teams indicate
no tool has been used in their reviews [@baum2017choice]. Considering that there are various tools
for code review, we find there are two groups. Specifically, for some teams, no specialized review
software is used. Instead, the teams use a combination of IDE, source code management system (SCM)
and ticket system/bug tracker. For others, lots of open source tools were used or mentioned:
Gerrit, Crucible, Stash, GitHub pull requests, Upsource, Collaborator, ReviewClipse and CodeFlow
[@baum2016faceted].

Concluding, based on different enterprises' expectation and requirments, they apply various methods
for code review. Additionally, we also find different tools are not very comparable as research
mentions these are tools for different teams, projects and metrics. It is hard to say which tool is
generally better than others. We found that code review is commonly applied in industries
and also it is a nice way to guarantee quality of software.

### RQ3
*What are future challenges in the area of code review?*
This concerns both research challenges and challenges for use in practice.

Since the concept of modern code review was proposed in 2013 [@bacchelli2013expectations], 
plenty of researchers spend their efforts on exploring code review. 
According to reference [@beller2014modern; @mcintosh2014impact], modern code review 
can be regarded as a lightweight variant of formal code inspections. 
However, code inspections mandates strict review research criteria and has been proved to improve
the software quality. 
Therefore, in this stage, many papers aim at increasing the understanding of modern code review and 
figuring out how it improves the software quality. 
During these study processes, to find out the practical application and impact, 
qualitative and quantitative methods are applied and some suggested challenges and improvements are
found.

* Future research challenges

Firstly, exploration into modern code review is still needed. Many studies suggest that 
further understanding of modern code review can be helpful to the future research. 
As an example, in reference [@czerwonka2015code] it says "Due to its costs, 
code reviewing practice is a topic deserving to be better understood, systematized 
and applied to software engineering workflow with more precision than the best practice currently
prescribes."

Specifically, some properties of modern code review such as code ownership can be explored, 
inspired by the reference [@mcintosh2014impact] which proposed a workflow 
to quantitatively research the relationship between code review coverage and software quality.

In reference [@bacchelli2013expectations], awareness and learning during code review are cited as 
motivations for code review by developers. Future research could research these aspects more
explicitly.

Inspired by the progress of the understanding of modern code review, researchers also propose some
possible topics that can be explored to obtain more findings.
 
Bacchelli et al. [@bacchelli2013expectations] suggest further research on code comprehension during
code review. 
According to the paper research has been done on this with new developers in mind, 
but it would also be applicable to code reviews. 
The authors note that IDEs often include tools for code comprehension, but code review tools do
not.

According to reference [@czerwonka2015code] prior research has neglected the impact of undocumented
changes on code review.
Future research can focus on this and figure out whether the undocumented changes make a
difference.

The authors of reference [@gousios2014exploratory] propose to research on the effect of the
democratization of the develoment process, which
occurs for example through the use of pull requests. Democratization could for example lead to a
substantially stronger commons ecosystem.
  
They also suggest research on formation of teams and management hierarchies with respect to
open-source projects and 
research on the motives of developers to work in a highly transparent workspace, as prior work do
not take these issues into consideration.

Besides, research on studying how best to interpret empirical software engineering research within the 
context of contextual factors in reference [@baysal2013influence].
Understanding the reasons behind observable developer behaviour
requires an understanding of the contexts, processes, organizational and individual factors, which
can be helpful to realize their influence on code review and the outcome.

* Future challenges in practice

So far, the code review process is adopted both in industry and communities. 
In reference [@bacchelli2013expectations] the authors propose future research on automating code
review tasks, which mainly concerns low-level tasks, like checking boundary conditions or catching
common mistakes. 

Similarly, authors of reference [@bird2015lessons] suggest to explore an automatic way to classify
and assess the usefulness of comments. This was specifically requested by an interviewees's and is
still an open challenge regarding CodeFlow, an in-house code review tool. They also propose to
research on methods to automatically recommend reviewers for changes in the system.

In reference [@gousios2014exploratory], the ways to managing tasks in the pull-based development
model can be explored, in order to increase the efficiency and readability.

This paper also gives us an example a tool which would suggest whether a pull request can be merged
or not, because this can be predicted with fairly high accuracy. Therefore, the development of
tools to help the core team of a project with prioritizing their work can be explored.

Several code review tools, such as CodeFlow, ReDA and RevFinder, can still be explored. In reference
[@thongtanunam2015should], further research can focus on how RevFinder works in practice, 
in terms of how effectively and practically it helps developers in recommending code-reviewers,
when deployed in a live development environment. According to reference [@thongtanunam2014reda],
the authors aim to develop a live code review monitoring dashboard based on ReDA. They also aim
to create a more portable version of ReDA that is also compatible with other tools supporting the
MCR process.

## Conclusions
While answering **RQ1**, we found that doing code reviews can improve social aspects and improve
code rubustness against errors. Additionally, finding a reviewer and the experience of a reviewer
significantly influence the code review process, along with technical and non-technical aspects of
the submitter and patch, like experience or patch description length. On another note, most changes
done during code review are done for maintainability, not for resolving errors, and often the
number of comments that are done during review are small. Last, we discussed two tools for
recommending reviewers that aim to reduce the time to find appropriate reviewers.

As for **RQ2**, we noticed that tools like Gerrit, Crucible, Stash, GitHub pull requests, Upsource,
Collaborator, ReviewClipse and CodeFlow are commonly used in practice. Most reseach proves that
code review helps share knowledge and develop defect-free software. It is not very reasonable to
rank various tools, as code review itself is much more important for quality of software compared
to the choice of a specific tool.

Concerning **RQ3**, we found that as the concept of modern code review was proposed 5 year ago,
most of the study aim at increasing the understanding of code review and its impact on software quality. 
However, prior work were still not enough such that some properties of were not taken into consideration.
Therefore, further study on modern code review is still needed, to obtain more useful information. And learned from the research progress, several
possible topics based on the prior work that can be explored to obtain more findings. 
In the practical area, more applications of code review, such as automating code
review tasks, are proposed by researchers, 
which can increase the efficiency.
And the code review tools can also be improved according to the current application.