The trouble with Chromium translations

Most applications that are intended for a broad international audience have their UI translated to various languages, the number of which can vary widely, depending on the resources of the vendor, especially their ability to recruit translators.

Vivaldi is currently being translated to 91 languages, a few more than Chrome.

Under the hood Vivaldi’s UI string (“string”: The term used in computer programming for a section of text) translation system actually consists of two independent systems, the Chromium one, and the system used by Vivaldi’s native UI.

This article will only cover the Chromium system and the challenges of using it.

The Chromium string/text translation and resource system consist of two kinds of files:

  • The GRD files, which can declare the US English version of the strings, and the location of various file resources like icons and documents (HTML, JS, CSS) used by the Chromium UI, and
  • The XTB files, one for each language, contain the various translations of the original strings in the associated GRD file.

When building the product (Vivaldi, in our case) these files are processed by various scripts in the build system and converted into files that can be handled by the Chromium code handling strings and the resources.

One of the challenges for a product like Vivaldi is that the strings and resources defined by the Chromium project are very specific to Chromium and Google Chrome, such as logos and the company and product names used in the string.

Of course, we in Vivaldi want to use the Vivaldi logo, and to use “Vivaldi” and name of the product and company. Duh!

That means that we have to change the resource definitions and the strings (and translations) to use Vivaldi’s preferred resources and names.

However, if you’ve read my article about maintaining a Chromium fork, you will have noticed that I said that you should never modify the Chromium translation files. Yet, I just said above that to use our preferred resources and strings we have to change the files. Why should we not change the files, and how do we work around the problem to still be able to use our chosen resources?

The reason why we should not change the files comes down to two reasons:

First, the Chromium resource files are frequently updated by the Chromium team. New resources and new strings are added, and old ones are changed to improve their meaning, and occasionally some are removed. All of these changes mean that when upgrading the Chromium source code, there is a significant risk that these changes will occur to the specific lines of the file we modified, or close to them. That means that we would have to resolve the conflicts between the new text and our changes, which will significantly increase the time needed to complete the update.

vivaldi/chromium translation strings

Second, for the strings we would have to not just modify the GRD file entry, we would have to modify the corresponding entry in each of the 80+ translation XTB files associated with each file, and to top it off, each of those entries has a numeric identifier calculated from the original string in the GRD file, so if you change the original string, you have to recalculate the value and update each XTB file for that entry. Ouch! Lots of work. Additionally, each of those updated entries in each file is another possible update merge conflict that has to be resolved manually. Double ouch!

So, how do we resolve this problem? How do we update the resources, strings, and translations without modifying the Chromium resource files? The answer is that we do change them, but we don’t change them.

What we have done in Vivaldi is to create our own resource GRD and XTB files for each set of Chromium resource files that we want to update, and add our file resources, strings, and translations in these files. The translation files are usually used to add the translations for the extra languages we support, but in some cases we do an extensive rewrite of the original string, which require more translations to be added in our version.

Then, while building the application we have updated the project and the scripts it used to automatically insert our updated changes into the data, before they are used to generate the binary files used by the application.

The result is that we don’t have to update the original files, but we can update the resources, strings, and translations.

This process is also used to automatically replace mentions of Chromium and Google Chrome company and product names with Vivaldi’s name, both in the original US English strings and the translations. This process does have its challenges, especially since “Google” is frequently used in combination with other words to name products we don’t support, like “Google Pay”, so we have to exclude such replacements.

Occasionally, there are strings that mention the Google, Chrome, or Chromium names when replacing them with Vivaldi is not desirable (and an example just showed up in the forums\_=1661690451983\, where information about a system Google is working on said Vivaldi instead, that has now been “fixed”), and in these cases, we exclude that particular string from being replaced.

Another recent example was the string “Chrome is made possible by the Chromium open-source project”, which was auto replaced into “Vivaldi is made possible by the Vivaldi open-source project”, not “Vivaldi is made possible by the Chromium open-source project”. Oooops! That was fixed by adding a full override of the text with correct wording.

Could we avoid using this kind of system? Well, it is not the only way to implement such a system.

One could add an independent set of resource files (and we have those for our own), and add our replacements in those files using different identifiers for them and replace the originals everywhere they are used. However, we would still have the problem with later updates, both of the strings and their meaning, and starting to use them elsewhere (which would have to be discovered and updated). Then there is the issue of more potential merge conflicts during updates.

Quite simply, using different identifiers would not work very well, since their use would have to be maintained continuously. Just replacing the original entries will generally work better.

And that ignores the use of product names in many strings. There are a lot of those names used around the code, and copying and modifying them into a different set of files would be a major undertaking, and would still have to be updated with new strings every Chromium upgrade.

The best way to avoid the search and replace of product names (and thus avoid the funny cases) would be for the Chromium team to stop using “Google”, “Google Chrome”, “Chromium” etc. hardcoded into the strings, but instead using variables that can insert the downstream project’s own preferred name in those strings. This kind of project would be a major undertaking by the Chromium team, and I sort of doubt they would be willing to take it on.

What do the other Chromium-based browser teams do? I have absolutely no idea. Maybe they use a similar system, or they have found their own way to manage the issue.

Soooo … you say you want to maintain a Chromium fork?

Tree with branches at Innovation House Magnolia
The branches of a tree at Innovation House Magnolia

Photo by Ari Greve

(Note: this article assumes you have some familiarity with Git terminology, building Chromium, and related topics)

Building your own Chromium-based browser is a lot of work, unless you want to just ship the basic Chromium version without any changes.

If you are going to work on and release a Chromium-derived browser, on the technical side you will need a few things when you start with the serious work:

  • A Git source code repository for your changes
  • One or more developer machines, configured for each OS you want to release on
  • Test machines and devices to test your builds
  • Build machines for each platform. These should be connected to a system that will automatically build new test builds for each source update, and your work branches, as well as build production (official) builds. These should be much more powerful than your developer machines. Official builds will take several hours even on a powerful machine, and requires a lot of memory and disk space. There are various cloud solutions available, but you should weigh time and (especially) cost carefully. Frankly, having your own on-premises build server rack may cost “a bit” up front, but it lets you have better control of the system.
  • A web site where you can post your Official builds so that your users can download and install them

Now you are good to go, and you can start developing and releasing your browser.

Then … the Chromium team releases a new major version (which they do every 4 or 8 weeks, depending on the track) with lots of security fixes. Now your browser is buggy and unsecure. How do you get your fixes to the new version?

This process can get very involved and messy, especially if you have a lot of patches on the Chromium code. These will frequently introduce merge conflicts when updating the source code to a newer Chromium version because the upstream project have updated the code you patched, or just nearby, but there are a few things you can do about that to reduce the problems.

There are at least two major ways to maintain updates for a code base: A git branch, and diff patches to be applied on a clean checkout. Both have benefits and challenges, but both will have to be updated regularly to match the upstream code. The process described below is for a git branch.

The major rule is to put all (or as much as practical) of your additional independent code that is whole classes and functions (even extra functions in Chromium classes) in a separate repository module that have the Chromium code as a submodule. Vivaldi uses special extensions to the GN project language to update the relevant targets with the new files and dependencies.

Other rules for patches are:

  • Put all added include/imports *after* the upstream includes/import declaration.
  • Similarly, group all new functions and members in classes at the end of the section. Do the same for other declarations.
  • Any functions you have to add in a source file should always be put at the end of the file, or at the end of an internal namespace.
  • Generally, try to put an empty line above and below your patch.
  • Identify all of your patches’ start and end.
  • Don’t change indentation of unmodified original code lines, unless you have to (e.g. in Python files).
  • Repetitive patching of the same lines should be fixuped or squashed. Such repetitions have the potential to trigger multiple merge conflicts during the update, which could easily cause errors and bugs to be introduced.
  • NEVER (repeat: NEVER!!!) modify the Chromium string and translation files (GRD and XTB). You will be in for a world of hurt when strings change (and some tools can mess up these files under certain conditions). If you need to override strings add the overrides via scripts, e.g. in the grit system merging your own changes with with the upstream ones (Vivaldi is using such a modified system; if there is enough interest from embedders we may upstream it; you can find the scripts in the Vivaldi source bundle if you want to investigate).

Vivaldi uses (mostly) Git submodules to manage submodules, rather than the DEPS file system used by Chromium (some parts of Vivaldi’s upstream source code and tools are downloaded using this system, though). Our process for updating Chromium will work whichever system is used, with some modifications.

The first step of the process is identifying which upstream commit (U) you are going to move the code to, and what is the first (F) and last (L, which you create a work branch W for) commit you are going to move on top of that commit. If you have updated submodules you do this for those as well.

(There are different ways to organize the work branch. We use a branch that is rebased for each update. A different way is to merge the upstream updates into the branch you are using, however this quickly gets even messier than rebasing branches, especially when doing major updates, and after two years of that we started rebasing branches instead.)

The second step is to check out the upstream U commit, including submodules. If you are using Git submodules you configure these at this stage. This commit should be handled as a separate commit, and not included in the F to L commits.

Then you update the submodules with any patches, and update the commit references.

The resulting Chromium checkout can be called W_0

Now we can start moving patches on top of W_0. The git command for this is deceptively simple:

git rebase --onto W_0 F~1 W

This applies each commit F through to L (inclusive) in sequence onto the W_0 commit and names the resulting branch W.

A number of these commits (about 10% of patched files in Vivaldi’s source base) will encounter merge conflicts when they are applied, and the process will pause while you repair the conflicts.

It is important to carefully consider the conflicts and whether they may cause functionality to break, and register such possibilities in your bug tracking system.

Once the rebase has completed (a process that can take several workdays) it is time for the next step: Get the code to build again.

This is done the same way as you normally build your browser, fixing compile errors as they are encountered, and yet again registering any that could potentially break the product. This is also a step that can take several work days. A frequent source of build problems are API changes and retired/renamed header files.

Once you have it built and running on your machine, it is time to (finally) commit all your changes and update the work branch in the top module and push everything into your repository. My suggestion is that patches in Chromium are mostly committed as “fixups” of the original patch; this will reduce the merge conflict potential, and keeps your patch in one piece.

Then you should try compiling it on your other delivery platforms, and fix any compile errors there.

Once you have it built and preferably have it running of the other platforms, you can have your autobuilders build the product for each platform, and start more detailed testing, fixing the outstanding issues and regressions that might have been introduced by the update. Depending on your project’s complexity, this can take several weeks to complete.

This entire sequence can be partially automated; you still have to manually fix merge conflicts and compile errors, as well as testing and fixing the resulting executable.

At the time of writing, Vivaldi has just integrated Chromium 104 into our code base, a process that took just over two weeks (the process may take longer at times). Vivaldi is only using the 8-week-cycle Extended Stable releases of Chromium due to the time needed to update the code base and stabilize the product afterwards. In our opinion, if you have a significant number of patches, the only way you can follow the 4 week cycle is to have at least two full teams for upgrades and development, and very likely the upgrade process will have to update weekly to the most recent dev or canary release.

Once you get your browser into production every couple of weeks you are going to encounter a slightly different problem: keeping the browser up to date with the (security) patches applied to the upstream version you are basing your fork on. This means, again, that you have to update the code base, but these changes are usually not as major as they are for a major version upgrade. A slightly modified, less complicated variant of the above process can be used to perform such minor version updates, and in our case this smaller process usually takes just a few hours.

Good luck with your brand new browser fork!

Not out of the woods yet: There are more POODLEs

As I wrote in my previous article about this, in October a group of Google security researchers had discovered a problem, called POODLE, in SSL v3 that in combination with another issue, browsers’ automatic fallback to older TLS and SSL versions, allowed an attacker to quickly break the encryption of sensitive content, like cookies.

The main mitigating methods for this problem are disabling SSL v3 support, both server side (now down to 66.2%, but slowing down) and in the client, and to limit the automatic fallback, either by not falling back to SSL v3 (which is now implemented by several browsers), or by a new method called TLS_FALLBACK_SCSV (introduced by Google Chrome and others). Continue reading “Not out of the woods yet: There are more POODLEs”

Attack of the POODLEs

Three weeks ago a group of researchers from Google announced an attack against the SSL v3 protocol (the ancestor of the TLS 1.x protocol) called POODLE (a stylish abbreviation of “Padding Oracle On Downgraded Legacy Encryption”). This attack is similar to the BEAST¬†attack that was revealed a few years ago, and one of the researchers that found the POODLE attack was part of the team that found BEAST.

POODLE is able to quickly discover the content of a HTTPS request, such as a session cookie, but only if the connection is using the SSL v3 protocol, a version of SSL/TLS that became obsolete with the introduction of TLS 1.0 in 1999. As almost all (>99%) secure web servers now support at least TLS 1.0 (which is not vulnerable to the attack, provided the server is correctly implemented), it might sound like this attack is not very useful. Unfortunately, that is not so. Continue reading “Attack of the POODLEs”