Genesis
When I encountered a Java library called POI during my work, I began to ponder: why can’t I find a Rust library like POI that can edit XLSX files? This question sparked the idea of developing a Rust library named edit-xlsx.
During my work, I realized that while there are Rust libraries available for reading and writing XLSX files, there isn’t one specifically tailored for editing them. That’s why I decided to embark on developing the edit-xlsx library.
The goal of edit-xlsx is simple: to provide users with a convenient and easy-to-use tool to effortlessly edit existing XLSX files. This means users can modify, add, or delete content from files without manually handling the complex details of the XLSX format.
Through the development of the edit-xlsx library, I aim to fill this gap in the Rust ecosystem and provide developers with a handy tool to more easily handle XLSX files without relying on other languages or platforms. This project is driven by my passion for the Rust language and my desire to solve real-world problems.
By the time you read this blog post, I have already released the edit-xlsx project on crates.io. Additionally, I will periodically document its updates.
Introduction to OpenXml and Xlsx
OpenXML is an open file format based on XML (eXtensible Markup Language) used for creating and managing electronic documents such as documents, spreadsheets, and presentations. It was introduced by Microsoft in 2006 and became part of the Office Open XML (OOXML) standard. XLSX files are one of the spreadsheet file formats created using the OpenXML standard.
The OpenXML standard defines the structure and content of XLSX files, enabling them to be read and edited by different applications without the need for specific software such as Office, LibreOffice, Numbers, or WPS. XLSX files are compressed XML files containing data, formatting, charts, metadata, and other information related to the file.
An XLSX file is essentially a compressed package containing multiple files and folders, collectively forming the structure and content of the spreadsheet.
To facilitate understanding the structure of XLSX files, let’s create an XLSX document containing the most basic content. Note: An XLSX file may contain more than just these files. Furthermore, a document containing only the following files may not necessarily function properly (depending on the processing software and version of the XLSX document).
- new_excel.xlsx
- _rels/
.rels
- [Content_Types].xml
- docProps/
- app.xml
- core.xml
- xl/
- _rels/
- workbook.xml
- styles.xml
- worksheets/
- sheet1.xml
Subsequently, we can gradually understand the purpose of each file:
_rels
: Contains XML files defining relationships between files. The.rels
file specifies the relationships between the workbook and worksheets.Content_Types.xml
: Defines the types of all XML files contained in the XLSX file. It informs applications about the content type of each XML file for proper parsing and handling.docProps
: Contains document property information, typically metadata describing the document’s author, creation date, last modification date, etc. It includes the following files:app.xml
: Contains application-related metadata such as the document’s title, author, subject, etc.core.xml
: Contains core metadata such as the document’s creator, creation date, last modifier, last modification date, etc.
xl
: This folder contains the main content._rels
: Contains XML files defining relationships between different sections of the workbook. For example, theworkbook.xml.rels
file defines relationships between worksheets and shared string files (shared strings will be introduced later).worksheets
: Contains data for each worksheet. In our example, there is only one file namedsheet1.xml
, which contains data and formatting information for the worksheet.sheet1.xml
: Contains data, cell styles, and formatting information for the worksheet.
workbook.xml
: Defines the structure of the workbook, including worksheets, names, and external links. It describes information such as the names and positions of all worksheets contained in the workbook.styles.xml
: Defines styles in the workbook, including fonts, colors, number formats, etc.
Processing XLSX Files
We know that XLSX files are essentially compressed packages of XML files (and folders). Therefore, processing XLSX files mainly involves handling compressed packages (decompression and compression) and XML processing (serialization and deserialization).
Below, I list the libraries I currently depend on to accomplish these tasks. There may be additions or reductions in the dependencies in the future, and I will document these modifications.
Introduction to Zip
Zip is the most downloaded compression library on crates.io (probably due to its name). This library supports the deflate algorithm used by XLSX files, making it suitable for our needs. Therefore, we use the zip library to compress and decompress XLSX files.
Introduction to Walkdir
Walkdir is a Rust library for recursively traversing directory structures and listing files and subdirectories within them. Importantly, this library is recommended by Zip for implementing compression methods. Thus, we combine this library with Zip to achieve compression functionality for XLSX files.
Introduction to Serde
Serde is one of the most popular serialization and deserialization libraries in Rust. Its name is derived from the combination of “serialize” and “deserialize,” aiming to provide a simple and flexible way to convert Rust data structures to and from various data formats.
In our project, we primarily use its derive feature to quickly implement serialization and deserialization for structures and enums.
Introduction to Quick-xml
Quick-xml is the XML serialization/deserialization tool I experimented with extensively and found most suitable for XLSX XML formats. Compared to projects like fast-xml, this project supports a more comprehensive range of features, such as reading XML structure attributes, values, etc., using @
, $
, and other methods. Moreover, this project is still maintained, and I look forward to it adding more user-friendly features in the future.
Conclusion
This blog post documented the reasons behind the development of the edit-xlsx project and the expected technology stack. If time and resources permit, I will continue to document my development progress in this section.