Developer's Guide

Introduction

This document describes the code standards for Bugzilla, and gives tips for Bugzilla developers.

Code is most likely to be integrated into Bugzilla if it follows these guidelines. Sometimes reviewers will allow you to bend the rules, but there must be a very good reason for doing so.

It is always better to contribute some code rather than no code. If your code isn’t perfect, a reviewer will tell you what you need to change in order for us to accept it. Usually, the reviewer will just tell you things that are already in this document.

Existing Bugzilla code does not necessarily conform to these guidelines, but we are working towards them. If you’re only changing 1 or 2 lines, there’s no need to fix up the entire file to make the file conform to these guidelines.

General Guidelines

We have a certain set of “basic principles” that this entire Developer’s Guide is based on:

Bug Fixing

When you fix a bug, ask yourself:

Test Suite

Nothing will get checked into Bugzilla that doesn’t pass the test suite. The test suite is automatically executed on each commit by the Travis CI system to warn us of bad code. It is using Bugzilla’s GitHub mirror. Anything that fails to pass the testing suite will be backed out immediately.

The test suite continues to expand to detect more problems as the Bugzilla architecture matures and our understanding of potential problems improves. If you add new code, consider adding a new test.

Currently, it primarily diagnoses problems in code similar to a “lint” like tool. However, it does do some tests on the small but growing “back end” of Bugzilla.

In order to run the tests, you must have the Test::More and Test::Harness Perl modules installed. You can then ./runtests.pl or ./runtests.pl --verbose from your Bugzilla main directory.

All Files

Perl Code

General

Throwing Errors

Normally, when an error occurs because of user action, you call ThrowUserError, eg:

ThrowUserError("user_made_mistake");

whereas when an error occurs because of a situation that should never happen (indicating a bug), call ThrowCodeError, eg:

ThrowCodeError("bugzilla_is_broken");

However, when it comes to throwing errors, there are a few important rules:

  1. Don’t blame the user if there’s doubt whether it’s the user’s fault.
  2. Don’t tell the user to email the admin if the error doesn’t mean Bugzilla is broken.

That means: use ThrowUserError every time that Bugzilla isn’t broken, even if you’re not sure it’s a user error. This rule overrides the general recommendations at the top of this section.

Taint Mode

All new CGIs should run in Perl’s taint mode, and existing CGIs which run in taint mode must not have taint mode turned off.

Taint mode works by marking untrusted data as “tainted”, and not allowing tainted data to be passed to various places. This ensures a multitude of security holes cannot occur.

For example, the values of $cgi->param variables and data from browser cookies are tainted, and tainted data may not be passed in to the database. Doing so will cause the CGI to generate an error and terminate.

Instead you must first “detaint” the data. The only way to do this directly is to get the untrusted data to match a regular expression and extract parts from it. So for example, to detaint a positive integer in the variable $foo:

if ($foo =~ /^(\d+)$/) {
    $foo = $1;
}
else {
    # ERROR!
}

Afterwards $foo will be detainted.

Note that there are a few convenience functions you should use to make life easier:

For more information on extracting from regular expressions, see perl’s documentation on regular expressions. For more information on taint mode, see perl’s security documentation.

Note that Bugzilla does NOT use DBI “taint-out” mode, so data returned from the database will not be tainted.

Style

Cross-Platform Compatibility

API Documentation

Want to help out with documenting our code? Use the following Perl Documentation example to help you get started:

# Perl code can go here.

=head1 NAME

Bugzilla::FlagType - A module to deal with Bugzilla flag types.

=head1 SYNOPSIS

  my $flaginfo = get($flag_id);
  my $count= count($criteria);
  my $inc_hash = get_inclusions($flag_id);
  my $exc_hash = get_exclusions($flag_id);

=head1 DESCRIPTION

FlagType.pm provides an interface to flag types as stored in Bugzilla.
See below for more information.

=head1 NOTES

=over

=item *

Prior to calling routines in this module, it's assumed that you have
already done a C<require CGI.pl>.

=item *

Use of private functions/variables outside this module may lead to
unexpected results after an upgrade.  Please avoid using private
functions in other files/modules.

=back

=cut

# More perl code can go here.

For those that are not familiar with writing POD, here are a few things to be aware of:

For an example of this POD style, see Bugzilla/DB.pm

To learn more about inline Perl Documentation, see the manual pages:

SQL

General

Schema Changes

If you make schema changes, you should modify:

If you’re changing the definition of a column, your checksetup code needs to only run if the column has the old definition. That is, it needs to do something like this:

# If bugs.qa_contact is currently defined as NOT NULL, re-define it without that.
if ($dbh->bz_column_info('bugs', 'qa_contact')->{NOTNULL}) {
    $dbh->bz_alter_column('bugs', 'qa_contact', {TYPE => 'INT3'});
    ## ... do other stuff here to fix up the new column ... ##
}

How Schema Updates Work

When Bugzilla updates the on-disk database schema, it also updates a Bugzilla::DB::Schema object in the bz_schema table. This is in Data::Dumper format, and shows what Bugzilla thinks the on-disk structure is.

Why do we do this? Well, because Bugzilla has one idea of what a field’s type is, and the underlying database frequently has a different idea. For example, in MySQL, bugs.bug_id has the following definition:

bug_id MEDIUMINT AUTO_INCREMENT NOT NULL PRIMARY KEY

In PostgreSQL, that same field has the following definition:

bug_id SERIAL UNIQUE NOT NULL PRIMARY KEY

See how different those are? And yet, to Bugzilla, both of these are represented with the following code:

bug_id => {TYPE => 'MEDIUMSERIAL', NOTNULL => 1, PRIMARYKEY => 1}

Later, if we want to change bug_id to say, a string (which I hope we never do, but who knows), in checksetup.pl we need to only do that change if it’s currently not already done. That is, we need to check if bug_id is a “MEDIUMSERIAL” and if so, change it to a string.

But guess what? The underlying database has no idea if it’s a MEDIUMSERIAL. But the schema we stored in the bz_schema table, when we first installed Bugzilla–it knows!

This becomes even more important when you think about the fact that on PostgreSQL, INT1, INT2, INT3, and INT4 are all represented as INTEGER on the disk. So sometimes Bugzilla needs to know whether something is an INT3 or an INT2, and it can’t just ask the database itself. It has to use the information stored in the bz_schema table.

How to Send And Receive Information From/To the Database

Bugzilla is using standard DBI functions to interact with the database, through Bugzilla->dbh. The current recommended method for creating a query is as follows:

use Bugzilla;
my $dbh = Bugzilla->dbh; # Connects if not already connected.
                         # Also handles db, user, password...
my $data = $dbh->selectall_arrayref("SELECT foo, bar FROM bath WHERE log = ?",
                                     undef, "foobar");

foreach my $row (@$data) {
    my ($foo, $bar) = @$row;
    # do whatever with $foo and $bar
}

For more information, see the DBI documentation or the Bugzilla::DB documentation.

Placeholders

Note that in the above example we use a question mark instead of inserting the string “foobar” directly into the SQL. That question mark is called a “placeholder.” This is the recommended method for passing in all variables into SQL statements.

Here’s an example of some SQL with a placeholder:

SELECT short_desc FROM bugs WHERE bug_id = ?

Then, your perl code looks something like this:

my $bug_id = 2;
my ($short_desc) = 
    $dbh->selectrow_array("SELECT short_desc FROM bugs WHERE bug_id = ?",
                             undef, $bug_id);

JOIN Statements and SELECTing From Multiple Tables

Frequently when using SQL you want data from more than one table at a time. You do this by “joining” the tables in various ways.

For this section, we will be using two tables in our examples, called a and b. Here is what they look like:

user_id user_name
1 jdoe
2 bsmith
3 mkanat
user_id email
1 [email protected]
2 [email protected]
2 [email protected]

Note that in the below examples we use SELECT * for simplicity, but SELECT *should never be used in Bugzilla code.

CROSS JOIN

The least common (but also least understood) type of JOIN is the “cross join,” also called the “cross-product join.”

A cross join means, “return all possible combinations of rows in these two tables.”

In SQL, there are two ways to do a cross join of tables a and b:

SELECT * FROM a, b or SELECT * FROM a CROSS JOIN b

The two pieces of code above are identical. The first one is called an “implicit” join (because the database just “figures out” that we mean CROSS JOIN, from the comma), and the second is called an “explicit” join (because we specified exactly what we want).

If we run the SQL above with our a and b tables, the result looks like:

a.user_id a.user_name b.user_id b.email
1 jdoe 1 [email protected]
1 jdoe 2 [email protected]
1 jdoe 2 [email protected]
2 bsmith 1 [email protected]
2 bsmith 2 [email protected]
2 bsmith 2 [email protected]
3 mkanat 1 [email protected]
3 mkanat 2 [email protected]
3 mkanat 2 [email protected]

In general, you want to avoid cross joins unless you are certain they are exactly what you need.

INNER JOIN

An INNER JOIN is one where you try to actually join two tables together based on a column that they have in common. This is the most common type of join.

For our a and b tables, we have two ways of writing an INNER JOIN:

SELECT * FROM a INNER JOIN b ON a.user_id = b.user_id or SELECT * FROM a, b WHERE a.user_id = b.user_id

Just like with the cross join, the first version is an “explicit” join, and the second one is an “implicit” join.

The inner join above looks like this:

user_id user_name email
1 jdoe [email protected]
2 bsmith [email protected]
2 bsmith [email protected]

Note that “mkanat” (with user_id 3) is left out, because he doesn’t have an entry in both tables.

LEFT JOIN

A LEFT JOIN is like an INNER JOIN, but it includes records that have an entry in a but no corresponding entry in b. That is, where “mkanat” was left out in our INNER JOIN above, he would be included in our LEFT JOIN.

A LEFT JOIN is normally only done one way:

SELECT * FROM a LEFT JOIN b ON a.user_id = b.user_id

The result of that code looks like this:

user_id user_name email
1 jdoe [email protected]
2 bsmith [email protected]
2 bsmith [email protected]
3 mkanat NULL

There was no “email” for “mkanat,” so the database returns NULL there.

It’s called a “left” join because we include entries from the table on the left (table a, note that it’s on the left side of the words “LEFT JOIN”) that don’t have a match in the table on the right side.

Warning: If we add a WHERE email = ? to the above SQL statement, the row containing mkanat will never be returned. This is because NULL is never equal to anything.

Indexes

Simple Description of Indexes

Indexes speed up SELECT statements.

Normally, if you have a table of 10,000 rows, and you want row number 8000, the database has to go through all 10,000 rows to find your row. That’s called a “table scan,” and it’s usually slow.

With an index, you ask it for “row 8000” and it gives it to you instantly.

More Details

OK, so in reality you wouldn’t be asking for “row 8000,” you’d be doing something like WHERE bugs.bug_id = 8000 in your SQL.

In order for the database to do that quickly, it needs an index on the bugs table, on the bug_id column.

When To Add An Index

Generally, you should add an index any time you plan to use a column in a WHERE clause, or in the “ON” part of a JOIN statement.

Adding too many indexes to a table will slow down INSERT and UPDATE statements, so don’t add indexes you don’t need.

Multi-Column Indexes

Many databases (MySQL in particular) will only use one index per table per query.

That is, imagine that we do: SELECT * FROM bugs WHERE priority = 'P1' OR op_sys = 'Windows'. We have two indexes on the bugs table, one for “priority” and one for “op_sys.” Only one of them will be used. The one that’s used is up to the database, it will try to pick the “best” one to use, and then do a table scan for the other one.

If we want that to be fast, we can create a single index that contains the values for both priority and op_sys. That is called a multi-column index.

Multi-column indexes are always in a certain order. For example, either “op_sys, priority” or “priority, op_sys.” For your purposes, those two indexes are totally identical, with one exception: The first column of a multi-column index can be used like it’s a single-column index. So, for example, having an index on both “priority, op_sys” and “priority” would be redundant. Don’t do it.

Write Queries With Indexes In Mind

When writing a query, give a brief thought to how it will use indexes. Don’t try to solve performance problems before you know they exist (remember our rules from the top of the Developer’s Guide), but do give a brief thought to it, particularly the fact that only one index will be used per table, per query, by common databases.

Cross-Database Query Compatibility

Different databases support different things, and different databases use different syntax for similar features. Bugzilla->dbh provides many functions that will generate correct SQL for the database that Bugzilla is using. These functions all have names that start with sql_.

Here are the MySQL constructs that must be replaced with Bugzilla::DB sql_ functions:

Also, if you absolutely need to do a case-insensitive string comparison, use sql_istrcmp instead of = or other operators.

The following SQL constructs should never be used. Replace them with something else instead:

The following query features are not currently available in all databases that Bugzilla supports, and so should be avoided:

A Note About INNER JOIN

In general, implicit INNER JOINs (FROM a, b WHERE a.user_id = b.user_id) are faster than explicit INNER JOINs (FROM a INNER JOIN b ON a.user_id = b.user_id). This is because explicit INNER JOINs bypass the query optimizer on some databases and specify that the database must use that table order.

However, you can’t mix explicit and implicit joins. It works on older MySQL versions (before 5.0), but that’s it. So, if you have any LEFT JOIN statements in your SQL, you cannot have any “comma” joins.

In general, the performance penalty for explicit INNER JOINs should only happen on older databases (PostgreSQL before 7.4, for example), so don’t worry about it too much. When in doubt, do an explicit join.

Style

Templates

General

Filenames and Paths

HTML Templates

Web Technologies

Security

Please consider the security implications of your code. In particular, thoroughly check user permissions, and thoroughly check data is in the right format.

Perl is a reasonably secure language. Buffer overflow and format string attacks are believed to not be possible.

Furthermore, Perl’s taint mode can catch a wide range of failure-to-validate errors and transform them from potential security holes into mere errors.

But there will always be security problems (see below), as no language can prevent all types of problem.

Don’t Trust URL Parameters And Cookies!

URL parameters and cookies are under the control of the user, and hence should not be trusted. Make sure they are in the correct format before using them, and don’t trust anything they say about the user’s authority or identity, other than through the standard Bugzilla authentication mechanisms.

To find a user’s identity, examine the properties of the User object returned by Bugzilla->user after the appropriate call to Bugzilla->login. See Bugzilla::User for details.

Confidential Information Leakage

Bugzilla has various facilities to restrict products and bugs to users in certain groups. When users can bypass this mechanism, it is a security hole.

This can occur when you fail to check appropriate permissions before doing something. Please consider this in your code.

Also, arbitrary user-passed SQL can be used in bug lists to return unexpected bugs, so you shouldn’t allow it.

Note that even the name of a product can be confidential. Whenever possible, don’t inform the user that something exists, if they can’t access it.

Unauthorised Access to Perform an Action

This is allowing a user to do something they shouldn’t be able to do, like edit a bug they don’t have access to. Similar to confidential information leakage, this also often occurs when you fail to do an appropriate check.

Arbitrary Code Execution

This is a very serious hole, as it often has the potential to let the attacker do a wide range of things with a system. It occurs when a user can pass code that then runs, including Perl, HTML, shell-script or SQL.

Generally this isn’t intentional, but instead occurs when you fail to properly validate or escape a string you pass as a part of some code.

Some instances of this should be picked up by taint mode:

Some aren’t, but will probably be picked up by the testing suite: