Drag files or click to select
You can convert 3 files up to 10 MB each
Drag files or click to select
You can convert 3 files up to 10 MB each
What is TGZ to TXZ Conversion?
Converting TGZ to TXZ means repacking the contents of a UNIX tarball from the legacy GZIP compression format into the modern XZ format with the LZMA2 algorithm. The inner TAR container with files remains unchanged: the same records, the same POSIX attributes, the same timestamps. Only the outer compression layer changes. TGZ (TAR + GZIP) uses the DEFLATE algorithm from 1992 with a 32 KB dictionary. TXZ (TAR + XZ) applies LZMA2 with a dictionary up to 1 GB, which provides significantly tighter compression at a modern level. The XZ format was introduced in 2009 as the successor to the LZMA format and quickly became the standard in the Linux ecosystem.
The main reason for converting TGZ to TXZ is modernizing the archive to a current standard. Back in 2013 the Linux kernel switched to distribution through kernel.org primarily in tar.xz format, abandoning tar.bz2. Arch Linux, Debian, Ubuntu, and many other distributions use XZ for software packages. The dpkg, rpm, and pacman package managers work with xz natively. Migrating from TGZ to TXZ saves 10-30% of space at comparable decompression speed and significantly better compression for text data.
During conversion, the GZIP layer is decompressed into the original TAR stream and that stream is packed into a new XZ layer. The TAR contents and structure are absolutely preserved. The size of the resulting TXZ is substantially smaller than TGZ, especially for source code, documentation, logs, and uniform data.
Technical Differences Between TGZ and TXZ Formats
Compression Algorithms
TGZ uses DEFLATE, a 1990s algorithm based on LZ77 with a 32 KB dictionary and Huffman coding. The data stream is processed sequentially in small blocks. Advantages are very high speed and minimal memory requirements; the drawback is limited compression density due to the small repetition search window.
TXZ applies LZMA2, a modern modification of the LZMA algorithm developed by Igor Pavlov. LZMA2 uses a huge sliding dictionary up to 1 GB, range coding with a context model, and adaptive data stream analysis. Long repetitions are found at large distances, producing significantly tighter compression. The XZ wrapper adds modern integrity checking (SHA-256 optionally), multithreading support, and preprocessing filters (Delta, BCJ for executables).
Capability Comparison Table
| Characteristic | TGZ | TXZ |
|---|---|---|
| Algorithm year | 1992 (GZIP) | 2009 (XZ) |
| Base algorithm | DEFLATE | LZMA2 |
| Dictionary size | 32 KB | up to 1 GB |
| Attribute container | TAR (POSIX) | TAR (POSIX) |
| Compression ratio | Baseline | 10-30% better (vs BZIP2), 30-50% (vs GZIP) |
| Compression speed | High | Medium |
| Decompression speed | High | Comparable or slightly lower |
| Decompression memory | Minimal | 50-200 MB |
| Multithreading | Limited | Full (xz -T) |
| Checksums | CRC-32 | CRC-32, CRC-64, SHA-256 |
| Preprocessing filters | None | Delta, BCJ, BCJ2 |
Compression Ratio: Real Examples
Archive size ratios for typical data sets:
| Data type | Original size | TGZ | TXZ (xz -9) | Savings in TXZ |
|---|---|---|---|---|
| Project source code | 100 MB | 18-22 MB | 12-15 MB | 30-40% |
| Text documents | 50 MB | 12-14 MB | 8-10 MB | 30-45% |
| SQL database dump | 200 MB | 35-45 MB | 20-30 MB | 40-55% |
| Server logs | 1 GB | 200-250 MB | 80-120 MB | 50-65% |
| Binary files (with BCJ filter) | 500 MB | 350-400 MB | 280-330 MB | 15-25% |
| XML/JSON documents | 200 MB | 30-40 MB | 18-25 MB | 35-50% |
| JPG images | 500 MB | 498-500 MB | 495-498 MB | minimal |
The XZ advantage is especially notable on source code, logs, and SQL dumps - the data types prevailing in Linux distributions. On already compressed media files the difference is negligible.
When TGZ to TXZ Conversion is Necessary
Modernizing Package Repositories
Modern Linux distributions use XZ as the standard:
- Arch Linux - packages in pkg.tar.zst or pkg.tar.xz format, repositories also in xz.
- Debian/Ubuntu - .deb packages contain data.tar.xz and control.tar.xz inside.
- Fedora/RHEL/CentOS - .rpm packages use xz for data compression.
- Slackware - the official txz package format since 2009.
- Gentoo Portage - distfiles for source code predominantly in tar.xz.
Archiving Source Code
The Linux kernel and large projects prefer XZ:
- Linux kernel - kernel.org archives are predominantly in tar.xz since 2013.
- GNU Project - GCC, glibc, binutils are distributed in tar.xz.
- KDE and GNOME - desktop environment releases are packed in xz.
- Apache Foundation - many Apache projects offer tar.xz alongside tar.gz.
Long Term Storage of Text Data
XZ is optimal for archives with rare access where size matters:
- Historical documentation - project documentation snapshots from past years.
- Mail archives - mbox mailboxes, IMAP exports.
- Audit logs - security logs, system change journals.
- Database snapshots - PostgreSQL and MySQL dumps from past periods.
Transfer Over Slow Channels
Smaller size speeds up transfer:
- Linux distribution mirrors - synchronizing millions of packages between servers.
- CI/CD pipelines - build artifacts between assembly and deployment stages.
- Remote backups - backups from offline sites over a limited channel.
- Satellite communication - scientific missions with expensive traffic.
Conversion Process
Transformation Stages
Reading the GZIP header - analysis of magic bytes 1f 8b, compression method, timestamp, and original file name in the outer TGZ wrapper.
Decoding DEFLATE - the algorithm restores the original TAR stream through inverse LZ77 (restoring repetitions from references in the 32 KB window) and inverse Huffman coding.
Preserving the TAR stream - the TAR contents are not modified in any byte: the same 512 byte records, the same ustar or PAX headers, the same data blocks.
Analysis for LZMA2 - compression parameters are determined: dictionary size (typically 64 MB for level 9), mode (fast, normal, or maximum), applicability of preprocessing filters.
Applying filters - if executables (.exe, .so, .o) are present inside, the BCJ filter is activated, converting relative jump addresses to absolute ones for better compression of repeating instructions.
LZMA2 compression - data is processed in blocks with long repetition search in the sliding dictionary. Range coding with a context model is applied.
Packing into the XZ container - blocks are wrapped in the XZ format with a header, block index, and CRC-64 checksum (default).
What is Preserved and What Changes
Fully preserved:
- Contents of every file byte for byte
- File and directory names with Unicode and long name support
- Directory structure of any depth
- Full POSIX attributes: owner, group, permissions, timestamps
- Symbolic and hard links
- Special files (FIFO, devices)
- Extended attributes (xattr) when PAX extensions are present
Changed:
- Final archive size (typically reduced by 30-50%)
- Outer compression algorithm (DEFLATE to LZMA2)
- File extension (.tgz/.tar.gz to .txz/.tar.xz)
- Outer layer checksums (CRC-32 replaced by CRC-64 or SHA-256)
Comparing TXZ with Other Archive Formats
TXZ vs TGZ
Direct migration from a legacy format.
| Criterion | TXZ | TGZ |
|---|---|---|
| Algorithm | LZMA2 | DEFLATE |
| Compression ratio | 30-50% better | Baseline |
| Compression speed | Slower | Very fast |
| Decompression speed | Comparable | Very fast |
| Modernity | 2009 | 1992 |
| Linux standard | Current | Previous |
TXZ wins on almost all parameters except compression speed.
TXZ vs TBZ2
TBZ2 was the predecessor of TXZ in Linux standards.
| Criterion | TXZ | TBZ2 |
|---|---|---|
| Algorithm | LZMA2 | BZIP2 |
| Compression ratio | 10-30% better | Good |
| Decompression speed | Faster | Slower |
| Age | 2009 | 1996 |
| Current standard | Yes | Aging |
TXZ replaced TBZ2 in most major Linux projects.
TXZ vs TAR.ZST
TAR.ZST is a modern competitor with a different balance.
| Criterion | TXZ | TAR.ZST |
|---|---|---|
| Algorithm | LZMA2 (2009) | ZSTD (2016) |
| Compression ratio | Slightly better | Comparable |
| Compression speed | Low | Very high |
| Decompression speed | High | Very high |
| Popularity | Linux standard | Growing |
ZSTD wins on speed, XZ on density; both are actively used in modern Linux.
TXZ Compatibility and Support
Operating Systems
XZ is supported by all modern UNIX systems and Windows:
- Linux - the xz utility and tar integration are present in all distributions. The
tar xJf archive.tar.xzcommand works out of the box. - macOS - xz is available through Homebrew, MacPorts, or built into macOS Catalina and newer. Archive Utility opens .xz through Finder.
- FreeBSD, OpenBSD, NetBSD - xz is in the base system or ports.
- Windows - 7-Zip, WinRAR, PeaZip, BandiZip open TXZ. The tar command in Windows 10 (build 17063+) supports .tar.xz.
- Android - modern file managers (ZArchiver, MiXplorer) extract TXZ.
- iOS - archive applications in the App Store work with TXZ.
Programming Languages
XZ/LZMA2 support is built in or available through libraries:
| Language | Standard library |
|---|---|
| Python | lzma, tarfile (with xz mode) |
| Java | apache-commons-compress, XZ for Java |
| C# / .NET | XZ.NET, SharpCompress |
| JavaScript / Node.js | lzma-native, xz-decompress |
| Go | github.com/ulikunitz/xz |
| Rust | xz2, liblzma-rs |
| C / C++ | liblzma (author Lasse Collin) |
| Ruby | xz-ruby |
Format History
- 1998 - Igor Pavlov develops the LZMA algorithm for 7-Zip.
- 2008 - Lasse Collin creates liblzma and the XZ format as the successor to the LZMA Utils project.
- 2009 - publication of the XZ Utils 4.999.9 beta specification.
- 2013 - the Linux kernel switches archive distribution to tar.xz as the primary format.
- 2010s - gradual displacement of bz2 from Linux distributions in favor of xz.
- Present day - XZ remains a compression standard in Linux alongside the growing ZSTD.
Limitations and Alternatives
When Converting to TXZ is Not Optimal
- Very frequent extraction with minimal resources - LZMA2 requires more memory during extraction (50-200 MB) than GZIP.
- Already compressed media data - the gain is minimal, compression speed is significantly lower.
- Compatibility with old UNIX - on legacy systems xz may be missing from the base installation.
- Compression speed priority scenarios - for CI/CD and streaming processing ZSTD is preferable.
Alternative Scenarios
- TGZ to TAR.ZST - modern fast algorithm with similar compression.
- TGZ to 7Z - cross platform format with the same LZMA2.
- TGZ to TBZ2 - compatibility with older UNIX systems.
- TGZ to TAR - strip compression for content modification.
TXZ is the optimal choice for long term storage and distribution in the Linux ecosystem when maximum compression density combined with full POSIX attribute preservation is required.
What is TGZ to TXZ conversion used for
Modernizing Linux Repositories
Migrating package archives and distributed files to the current Linux compression standard
Long Term Source Code Storage
Archiving releases and git repository snapshots with maximum disk space savings
Mirror Distribution
Reducing archive size on mirror servers and in CDN networks
SQL Dumps and Logs Storage
Compact archival of text data with multifold gains compared to GZIP
Tips for converting TGZ to TXZ
Account for decompression memory
LZMA2 requires 50-200 MB of memory during extraction versus a few megabytes for GZIP. On very weak systems this may be a factor
Choose between XZ and ZSTD
XZ provides slightly better compression, ZSTD - much higher compression speed. For CI/CD and streaming processing ZSTD is more efficient, for archives with rare access XZ wins