Wednesday, December 26, 2007

XFS: A high-performance journaling filesystem

SGI - Developer Central Open Source XFS
XFS: A high-performance journaling filesystem
XFS combines advanced journaling technology with full 64-bit addressing and scalable structures and algorithms. This combination delivers the most scalable high-performance filesystem ever conceived.



Features
The XFS filesystem provides the following major features:
Quick Recovery
The XFS journaling technology allows it to restart very quickly after an unexpected interruption, regardless of the number of files it is managing. Traditional filesystems must do special filesystem checks after an interruption, which can take many hours to complete. The XFS journaling avoids these lengthy filesystem checks.
Fast Transactions
The XFS filesystem provides the advantages of journaling while minimizing the performance impact of journaling on read and write data transactions. Its journaling structures and algorithms are tuned to log the transactions rapidly.
XFS uses efficient tree structures for fast searches and rapid space allocation. XFS continues to deliver rapid response times, even for directories with tens of thousands of entries.
Massive Scalability
XFS is a full 64-bit filesystem, and thus is capable of handling filesystems as large as a million terabytes. 263 = 9 x 1018 = 9 exabytes
A million terabytes is thousands of times larger than most large filesystems in use today. This may seem to be an extremely large address space, but it is needed to plan for the exponential disk density improvements observed in the storage industry in recent years. As disk capacity grows, not only does the address space need to be sufficiently large, but the structures and algorithms need to scale. XFS is ready today with the technologies needed for this scalability.
XFS also continues to evolve to match the capabilities of the hardware it is being deployed on. Efficiency when dealing with large amounts (terabytes) of main memory and hence large numbers of active files and large amounts of cached file data are areas demanding continual improvements. Extending XFS to improve performance on large NUMA machines is also an area of active research and development.
Efficient Allocations
XFS implements extremely sophisticated space management techniques. Efficiency in space management has been achieved through the use of variable sized extents, rather than the simple single-block-at-a-time mechanism of many other filesystems. XFS was the first filesystem to implement delayed space allocation for buffered writes, supports direct I/O, provides an optional realtime allocator, and is able to align allocations based on the geometry of the underlying storage device. The XFS allocator performs admirably in the presence of multiple parallel writers, and is renowned for its resistance to space fragmentation under such conditions.
Excellent Bandwidth
XFS is capable of delivering very close to the raw I/O performance that the underlying hardware can provide. XFS has proven scalability on SGI Altix systems of multiple gigabytes-per-second on multiple terabyte filesystems.
Technical Specifications
Technology
Journaled 64-bit filesystem with guaranteed filesystem consistency.
Availability
XFS is available for Linux 2.4 and later Linux kernels.
Online Administration
XFS supports filesystem growth for mounted volumes, allows filesystem "freeze" and "thaw" operations to support volume level snapshots, and provides an online file defragmentation utility.
Quotas
XFS supports user and group quotas. XFS considers quota information as filesystem metadata and uses journaling to avoid the need for lengthy quota consistency checks after a crash. Project quota are now also supported, and these can be used to provide a form of directory tree quota.
Extended Attributes
XFS implements fully journaled extended attributes. An extended attribute is a name/value pair associated with a file. Attributes can be attached to all types of inodes: regular files, directories, symbolic links, device nodes, and so forth. Attribute values can contain up to 64KB of arbitrary binary data. XFS implements three attribute namespaces: a user namespace available to all users, protected by the normal file permissions; a system namespace, accessible only to privileged users; and a security namespace, used by security modules (SELinux). The system namespace can be used for protected filesystem meta-data such as access control lists (ACLs) and hierarchical storage manager (HSM) file migration status.
POSIX Access Control Lists (ACLs)
XFS supports the ACL semantics and interfaces described in the draft POSIX 1003.1e standard.
Maximum File Size
For Linux 2.4, the maximum accessible file offset is 16TB on 4K page size and 64TB on 16K page size. For Linux 2.6, when using 64 bit addressing in the block devices layer (CONFIG_LBD), file size limit increases to 9 million terabytes (or the device limits).
Maximum Filesystem Size
For Linux 2.4, 2 TB. For Linux 2.6 and beyond, when using 64 bit addressing in the block devices layer (CONFIG_LBD) and a 64 bit platform, filesystem size limit increases to 9 million terabytes (or the device limits). For these later kernels on 32 bit platforms, 16TB is the current limit even with 64 bit addressing enabled in the block layer.
Filesystem Block Size
The minimum filesystem block size is 512 bytes. The maximum filesystem block size is the page size of the kernel, which is 4K on x86 architecture and is set as a kernel compile option on the IA64 architecture (up to 64 kilobyte pages). So, XFS supports filesystem block sizes up to 64 kilobytes (from 512 bytes, in powers of 2), when the kernel page size allows it.
Filesystem extents (contiguous data) are configurable at file creation time using xfsctl(3) and are multiples of the filesystem block size. Individual extents can be up to 4 GB in size.
Physical Disk Sector Sizes Supported
512 bytes through to 32 kilobytes (in powers of 2), with the caveat that the sector size must be less than or equal to the filesystem blocksize.
NFS Compatibility
With NFS version 3, 64-bit filesystems can be exported to other systems that support the NFS V3 protocol. Systems that use NFS V2 protocol may access XFS filesystems within the 32-bit limit imposed by the protocol.
Windows Compatibility
SGI uses the Open Source Samba server to export XFS filesystems to Microsoft Windows systems. Samba speaks the SMB (Server Message Block) and CIFS (Common Internet File System) protocols.
Backup/Restore
xfsdump and xfsrestore can be used for backup and restore of XFS file systems to local/remote SCSI tapes or files. It supports dumping of extended attributes and quota information. As the xfsdump format has been preserved and is now endian neutral, dumps created on one platform can be restored onto an XFS filesystem on another (different architectures, and even different operating systems - IRIX to Linux, and vice-versa).
Support for Hierarchical Storage
The Data Management API (DMAPI/XDSM) allows implementation of hierarchical storage management software with no kernel modifications as well as high-performance dump programs without requiring "raw" access to the disk and knowledge of filesystem structures.
Optional Realtime Allocator
XFS supports the notion of a "realtime subvolume" - a separate area of disk space where only file data is stored. Space on this subvolume is managed using the realtime allocator (as opposed to the default, B+ tree space allocator). The realtime subvolume is designed to provide very deterministic data rates suitable for media streaming applications.