Monday, December 31, 2007

XFS: A high-performance journaling filesystem

XFS: A high-performance journaling filesystem
SGI - Developer Central Open Source XFS
XFS combines advanced journaling technology with full 64-bit addressing and scalable structures and algorithms. This combination delivers the most scalable high-performance filesystem ever conceived.


Features


The XFS filesystem provides the following major features:


  • Quick Recovery

    The XFS journaling technology allows it to restart very quickly after an
    unexpected interruption, regardless of the number of files it is managing.
    Traditional filesystems must do special filesystem checks after an interruption,
    which can take many hours to complete. The XFS journaling avoids these lengthy
    filesystem checks.


  • Fast Transactions

    The XFS filesystem provides the advantages of journaling while minimizing the
    performance impact of journaling on read and write data transactions. Its
    journaling structures and algorithms are tuned to log the transactions rapidly.


    XFS uses efficient tree structures for fast searches and rapid space
    allocation. XFS continues to deliver rapid response times, even for directories
    with tens of thousands of entries.


  • Massive Scalability

    XFS is a full 64-bit filesystem, and thus is capable of handling filesystems
    as large as a million terabytes.

    263  = 9 x 1018 = 9 exabytes 


    A million terabytes is thousands of times larger than most large filesystems
    in use today. This may seem to be an extremely large address space, but it is
    needed to plan for the exponential disk density improvements observed in the
    storage industry in recent years. As disk capacity grows, not only does the
    address space need to be sufficiently large, but the structures and algorithms
    need to scale. XFS is ready today with the technologies needed for this
    scalability.


    XFS also continues to evolve to match the capabilities of the hardware it is
    being deployed on. Efficiency when dealing with large amounts (terabytes) of
    main memory and hence large numbers of active files and large amounts of cached
    file data are areas demanding continual improvements. Extending XFS to improve
    performance on large NUMA machines is also an area of active research and
    development.


  • Efficient Allocations

    XFS implements extremely sophisticated space management techniques.
    Efficiency in space management has been achieved through the use of variable
    sized extents, rather than the simple single-block-at-a-time mechanism of many
    other filesystems. XFS was the first filesystem to implement delayed space
    allocation for buffered writes, supports direct I/O, provides an optional
    realtime allocator, and is able to align allocations based on the geometry of
    the underlying storage device. The XFS allocator performs admirably in the
    presence of multiple parallel writers, and is renowned for its resistance to
    space fragmentation under such conditions.


  • Excellent Bandwidth

    XFS is capable of delivering very close to the raw I/O performance that the
    underlying hardware can provide. XFS has proven scalability on SGI Altix systems
    of multiple gigabytes-per-second on multiple terabyte filesystems.



Technical Specifications


Technology


Journaled 64-bit filesystem with guaranteed filesystem consistency.


Availability


XFS is available for Linux 2.4 and later Linux kernels.


Online Administration


XFS supports filesystem growth for mounted volumes, allows filesystem
"freeze" and "thaw" operations to support volume level snapshots, and provides
an online file defragmentation utility.


Quotas


XFS supports user and group quotas. XFS considers quota information as
filesystem metadata and uses journaling to avoid the need for lengthy quota
consistency checks after a crash. Project quota are now also supported, and
these can be used to provide a form of directory tree quota.


Extended Attributes


XFS implements fully journaled extended attributes. An extended attribute is
a name/value pair associated with a file. Attributes can be attached to all
types of inodes: regular files, directories, symbolic links, device nodes, and
so forth. Attribute values can contain up to 64KB of arbitrary binary data. XFS
implements three attribute namespaces: a user namespace available to all users,
protected by the normal file permissions; a system namespace, accessible only to
privileged users; and a security namespace, used by security modules (SELinux).
The system namespace can be used for protected filesystem meta-data such as
access control lists (ACLs) and hierarchical storage manager (HSM) file
migration status.


POSIX Access Control Lists (ACLs)


XFS supports the ACL semantics and interfaces described in the draft POSIX
1003.1e standard.


Maximum File Size


For Linux 2.4, the maximum accessible file offset is 16TB on 4K page size and
64TB on 16K page size. For Linux 2.6, when using 64 bit addressing in the block
devices layer (CONFIG_LBD), file size limit increases to 9 million terabytes (or
the device limits).


Maximum Filesystem Size


For Linux 2.4, 2 TB. For Linux 2.6 and beyond, when using 64 bit addressing
in the block devices layer (CONFIG_LBD) and a 64 bit platform, filesystem size
limit increases to 9 million terabytes (or the device limits). For these later
kernels on 32 bit platforms, 16TB is the current limit even with 64 bit
addressing enabled in the block layer.


Filesystem Block Size


The minimum filesystem block size is 512 bytes. The maximum filesystem block
size is the page size of the kernel, which is 4K on x86 architecture and is set
as a kernel compile option on the IA64 architecture (up to 64 kilobyte pages).
So, XFS supports filesystem block sizes up to 64 kilobytes (from 512 bytes, in
powers of 2), when the kernel page size allows it.


Filesystem extents (contiguous data) are configurable at file creation time
using xfsctl(3) and are multiples of the filesystem block size. Individual
extents can be up to 4 GB in size.


Physical Disk Sector Sizes Supported


512 bytes through to 32 kilobytes (in powers of 2), with the caveat that the
sector size must be less than or equal to the filesystem blocksize.


NFS Compatibility


With NFS version 3, 64-bit filesystems can be exported to other systems that
support the NFS V3 protocol. Systems that use NFS V2 protocol may access XFS
filesystems within the 32-bit limit imposed by the protocol.


Windows Compatibility


SGI uses the Open Source Samba server to export XFS filesystems to Microsoft
Windows systems. Samba speaks the SMB (Server Message Block) and CIFS (Common
Internet File System) protocols.


Backup/Restore


xfsdump and xfsrestore can be used for backup and restore of XFS file systems
to local/remote SCSI tapes or files. It supports dumping of extended attributes
and quota information. As the xfsdump format has been preserved and is now
endian neutral, dumps created on one platform can be restored onto an XFS
filesystem on another (different architectures, and even different operating
systems - IRIX to Linux, and vice-versa).


Support for Hierarchical Storage


The Data Management API (DMAPI/XDSM) allows implementation of hierarchical
storage management software with no kernel modifications as well as
high-performance dump programs without requiring "raw" access to the disk and
knowledge of filesystem structures.


Optional Realtime Allocator


XFS supports the notion of a "realtime subvolume" - a separate area of disk
space where only file data is stored. Space on this subvolume is managed using
the realtime allocator (as opposed to the default, B+ tree space allocator). The
realtime subvolume is designed to provide very deterministic data rates suitable
for media streaming applications.