Confused by buzzwords? You’re not alone.
EXPLAINED is a new explainer series for people who work with technology – but don’t want marketing fluff or academic theory.
We break down complex digital topics clearly and practically: what they mean, why they matter, and how they work in the real world.
You may have seen documents change slightly between tools – sometimes that’s between different companies’ applications, sometimes it’s even the same company’s browser-based or desktop offering. A table moves, a font changes, something is missing, or a layout looks just a bit off.
It’s all because of differences between how documents are described by both editors and file formats. The good news is: modern document editors can be designed to handle the linguistic differences.
Not all formats speak the same language
English and Chinese are the two most commonly spoken languages, each with over 1 billion speakers. Both languages are used effectively every day to describe the world, conduct business, govern, teach, and communicate. They don’t however encode information in exactly the same way, and each contains unique words and features.
Let’s take a look at a simple example:
English: Where is the OOXML proprietary documentation?
Chinese: OOXML专有文档在哪里?(OOXML proprietary documentation is where?)
The difference in word order is obvious. With a little study, even an average student will learn to make the change reliably.
Less obvious perhaps is that the Chinese sentence is missing a ‘the’ equivalent. This is because Chinese has no definite (or indefinite) article. This creates the possibility of data loss when the sentence is ‘round tripped’ from English to Chinese and back to English, perhaps resulting in the erroneous final output, “Where is OOXML proprietary documentation?”
Fortunately for anyone invested in international communications/trade/politics/sport, these translation questions are easily resolved with an experienced translator. Similarly for anyone interested in document interoperability, the question is well served by a document editor such as Collabora Office.
How document editors should translate
Modern document formats such as OOXML (.docx, .xlsx, .pptx …) and ODF (.odt, .ods, .odp …) are both highly capable and widely used. But like languages, they express certain details differently.
Each editor has a native file format, like a mother tongue. For Collabora Office, that is ODF, which is a truly open format with an open development process. For Microsoft that is OOXML, “a pseudo-standard that pretends to be open”.
Ideally, a well-designed document editor acts like a skilled translator. It ensures that documents open, display, and behave consistently, even when moving between formats.
To do this, the document editor uses what are known as import and export filters ensuring that features are handled correctly. This does however leave open the question of what to happens to features that are only available in one format and therefore cannot be ‘translated’ (similar to the ‘the’ from the above example).
A good way to illustrate this question is by showing what happens when it goes wrong. Microsoft opens and edits all files in the OOXML format. This means if you open an ODP with the total slide count in the template in PowerPoint for example, a feature OOXML does not have, it permanently throws away the information – even without trying to open or save the file as .pptx. PowerPoint’s mother tongue is OOXML, so it has no idea what <count> means.
This makes us sad! We like to know how much longer a presentation is going to go on for…
In the case of a feature missing between formats in the Collabora Office suite, we tend touse what we call ‘grab bags’ to temporarily store the document feature before saving it again in its original format. Ideally, PowerPoint would operate more like the Collabora Office suite, and recognise this as information it doesn’t fully understand but that should be retained somehow. Unfortunately that is not the case.
Regrettably, this principle is not well understood by many, even by those who are aware of the problem of dependency on Microsoft’s OOXML. Two of the first five issues raised in a new office suite’s GitHub repos are duplicate requests for the suite to default to ODF. Unfortunately the office suite in question is based on a document editor that primarily speaks OOXML and has poor ODF support, so the requests are unlikely to be actioned.
Not all technologies speak the same language
Many (most?) Microsoft Office users will have noticed that Microsoft Office features look and behave differently depending on if a document is opened in the browser, or using the desktop app. This is because the applications known to most as “Microsoft Office” are in fact two independently developed applications – Word Desktop and Word Web. Confusing the situation further, Microsoft appears to have multiple and slightly differing descriptions of how features operate between the two.
Fortunately for users of the Collabora Office suite, with the newly released Collabora Office, they can enjoy one and the same editing and document rendering experience whether editing documents in-browser, or on their desktop thanks to the fully shared codebase between the two apps.
Achieving perfect interoperability
We’d like to say that we are perfect translators, understanding everything and converting 100% of documents flawlessly, but this would be untrue. What we can say is we are excellent at working with both Microsoft file formats and ODF.
Additionally, unlike most Microsoft format based editors, we think it is important to work well with both legacy document types, as well as the open future. Arguments suggesting that because we encourage usage of ODF, we must therefore not be appropriate for use with .docx or .xlsx are poor and don’t stand up to scrutiny. Just because we’re great at ODF doesn’t stop us being great at interop too.
While we prefer ODF dues to it’s openness and documentation, we support and develop OOXML, both for new features that come with new versions. As a company, we deliberately use both formats internally, and do not encounter issues in our regular working. With our support these minor dialectical difference don’t need to hold up real-world large-scale migrations either.
Among other efforts to improve interoperability, over the past year we have been focusing on validity testing of about 243 000 documents, spreadsheets and presentations in various formats converted to the corresponding OOXML format, with the goal of getting to zero. The following chart shows the progress over the last few months for spreadsheets and presentations.
Achieving perfect interop?
You should be able to open a document, edit it, share it, and know it will behave as expected. This will only be fully achieved as the world moves to a truly open standard, and away from the Microsoft owned/governed semi-proprietary confusion.
Additionally your documents should behave the same in your browser, on your desktop, or in your pocket. This is now possible with shared-codebase programs like Collabora Online and Collabora Office.
In the meantime, we continue to deal with the reality of ubiquitous Microsoft-format legacy documents, and are pleased that the vast majority of users will not discover issues opening and editing Microsoft file types in their day to day work with Collabora Office.
If you want to see what that looks like in practice, we encourage you to try out the Collabora Office suite – and enjoy working across platforms and formats with confidence.