Distributed file system

Distributed File System (DFS) is a role service under the File and Storage Services role that enables you to group shares from different servers into a single logical share called a namespace.

DFS has four main components:

NameSpace_organization
Namespace server—A namespace server is a Windows server with the DFS Namespaces role service installed.
Namespace root—The namespace root is a folder that’s the logical starting point for a namespace. It contains one or more folders or folder targets but no files. To access it, you use a UNC path, such as \\Domain1\AllShares or \\DFSServer\AllShares.

Folder—A folder can be used to organize the namespace without containing any actual files, or a folder can contain one or more folder targets. A folder without folder targets simply adds struc- ture to the namespace hierarchy. For example, a folder named Marketing Docs might contain one or more folders with folder targets that are shared folders containing files for the Marketing Department. In Figure above, Share1 and Share2 are folders, and both contain folder targets.
Folder target—A folder target is a UNC path that points to a shared folder hosted on a server. A folder can have one or more folder targets. If there’s more than one folder target, the files are usually replicated between servers to provide fault tolerance. In Figure, the folder target for Share 1 is \\Server1\Share1, and for Share2, it’s \\Server2\Share2. The folder names can be the same as the share name, but they don’t have to be.

One way to help ensure reliable access to files is to use replication to make copies of files in different locations.

DFS replication isn’t designed to be a substitute for regular backups, but it can be used to enhance backup effectiveness and efficiency.

Name space

There are two types of namespaces: domain-based and stand-alone. The type you choose depends on several factors: whether you’re using Active Directory, the availability requirements of the namespace, the number of folders needed in a namespace, and the need for access-based enumeration (which allows users to see only files they actually have access to).
A namespace must be stored somewhere on the network, and the type of namespace determines the storage location.

  • A domain-based namespace enables you to increase its avail- ability by using multiple namespace servers in the same domain. This namespace type doesn’t include the server name in the namespace, making it easier to replace a namespace server or move the namespace to a different server.
  • A stand-alone namespace stores information only on the server where it’s created and includes the server name in the namespace. If this server becomes unavailable, the namespace becomes unavailable, too. Stand-alone namespaces in Windows Server 2012 R2 can support up to 50,000 folders and access-based enumeration.

Two modes:

  • Windows Server 2008 mode is available if the domain uses the Windows Server 2008 (or higher) functional level, and all the namespace servers are running Windows Server 2008, Windows Server 2008 R2, Windows Server 2012, or Windows Server 2012 R2.
  • Windows 2000 Server mode (or have no choice because of the domain’s structure), domain-based namespaces are limited to 5000 folders and don’t support access-based enumeration.

File security in DFS namespaces is managed via the same permissions as for standard files and folders: share permissions and NTFS permissions. As a general rule, adjusting permissions on shares before configuring DFS is best. However, if multiple servers and folder targets are used with DFS replication, permissions on files and folders are replicated by DFS.

Configuring Referrals and Advanced Namespace Settings

A simple DFS namespace with a single server for hosting the namespace and a single folder target for each folder might not require more configuration. However, if you want to add fault tolerance and load sharing to a DFS namespace, you might want to configure the namespace’s properties. To do so, right-click the namespace in DFS Management and click Properties.

The namespace Properties dialog box has three tabs: General, Referrals, and Advanced.

  • The General tab just supplies information about the namespace, such as name, type (Windows Server 2008 or Windows 2000 Server), an optional description, and the number of folders in the namespace.
  • Referrals tab: Defines how DFS works when there are multiple servers for a namespace root or folder target. Recall that the namespace root can have multiple servers hosting it, and each folder can have multiple targets. When a client attempts to access a namespace root or the underlying folders, it receives a referral, which is a list of servers (targets) that host the namespace or folder. The client then attempts to access the first server in the referral list. If the first server is unavailable, the client attempts to access the second server in the referral list and so forth.
    The first option in the Referrals tab is the cache duration, which is the time (300 seconds by default) a client keeps a referral before requesting it again. By caching the referral, the client doesn’t have to request the referral list each time it accesses the namespace, thereby maximizing access speed and reducing the bandwidth needed to access the namespace.The next option is the ordering method, which determines the order in which servers are listed in a referral and can be set to the following values:
    Lowest cost—Lists servers in the same Active Directory site as the client first. If there’s more than one server in the site, servers in the same site as the client are listed in random order. Servers outside the client’s site are listed from lowest cost to highest cost. Cost is based on the cost value assigned to a site in Active Directory Sites and Services.
    Random order—Similar to the “Lowest cost” option, servers in the same Active Directory site as the client are listed first. However, servers outside the client’s site are ordered randomly, ignoring cost.
    Exclude targets outside of the client’s site—The referral contains only servers in the same site as the client. If there are no servers in the client’s site, the client can’t access the requested part of the namespace. This method can be used to ensure that low-bandwidth connections, such as virtual private networks (VPNs), can’t access shares containing large files.
    The last option in the Referrals tab, under the Ordering method list box, is “Clients fail back to preferred targets.” It’s important only if referral order has been overridden in the prop- erties of the namespace server or folder target, which essentially configures a preferred target.
  • The Advanced tab has options for configuring polling and access-based enumeration. When namespaces change, changes are reflected instantly in a stand-alone namespace. If a domain-based namespace changes, however, information must be relayed to all the namespace servers. Namespace changes are first reported to the server in the domain holding the PDC emulator Flexible Single Master Operation (FSMO) role. The PDC emulator then repli- cates this information to all other domain controllers.By default, namespace servers poll the PDC emulator to get the most current information for a namespace. In DFS configurations with many namespace servers, polling can place a con- siderable load on the PDC emulator. The more namespace servers in a domain, the larger the load is on the PDC emulator because of increased polling. If necessary, you can configure these polling options to reduce the load on the PDC emulator:
    • Optimize for consistency—This setting is the default. In a domain with 16 or fewer namespace servers, this method is preferred because namespace servers poll the PDC emulator, which is the first DC updated after a namespace change.
    • Optimize for scalability—This setting causes namespace servers to poll the nearest DC for namespace changes. This setting reduces the load on the PDC emulator but should be used only when there are more than the recommended 16 namespace servers in the domain. Because there’s a delay between the PDC emulator getting a namespace update and the other DCs receiving it, users might have an inconsistent view of a namespace.
    The last option in the Advanced tab is for enabling access-based enumeration for the namespace. Making sure only authorized users have access to sensitive data is a concern in most organizations. Restricting permissions on files and folders certainly helps, but to improve security, you can enable access-based enumeration to prevent users from even seeing files and folders they don’t have permission to access.
  • Overriding Referral Order : You can use the namespace Properties dialog box to configure referral settings that affect all folder targets in the namespace.
    However, you might want to override these settings for a particular folder target. One target is a high-performance file server, and the other server has lower performance. In this example, you might want the high-performance server to be the preferred server clients use when accessing the folder instead of using the normal referral order.DFS Management console, click the folder you want to change, and then right-click the folder target and click Properties. In the folder’s Properties dialog box, click the Advanced tab and se- lect the “Override referral ordering” check box (see Figure 3-10). Then select one of the follow- ing target priorities:
    • First among all targets—This server is the default target if it’s available. Use this option if you want clients to always use this target to access the folder.
    • Last among all targets—You want clients to use this target only if no other targets are available.
    • First among targets of equal cost—If more than one target exists in a site, this target is always listed first in the referral list.
    • Last among targets of equal cost—If more than one target exists in a site, this target is always listed last in the referral list.
Replication Groups

A replication group consists of servers, known as members, that synchronize data in folders so that when a change occurs, all replication group members are updated at once. To create a replication group, you must have a minimum of two servers. One server is designated as the primary and the other as the secondary. After a replication group is defined, you add folders to it. Files in replicated folders on the secondary server, if any, are overwritten.

There are several maximums to take into account when creating a replication group:
• A single file to be replicated must be less than 250 GB.

• The number of files to be replicated on a volume must be less than 70 million.

• The total size of all replicated files on a server must be less than  100 TB.

A server participating in a DFS replication group must have the DFS Replication role service installed.

There are two types of replication groups:

  1. A multipurpose replication group contains two or more servers and is used for content sharing and document publication when you want to provide fault tolerance and load balancing for file shares.
  2. A replication group for data collection consists of only two servers and is used mainly to transfer data from one server to another for backup purposes.
Optimizing DFS Replication

Replication Topology    

A replication topology describes the connections used to replicate files between servers. Three topologies are available for replication groups: hub and spoke, full mesh, or no topology.

  • With hub and spoke, all members of the group synchronize with the hub only. So a change on one member is updated on the hub, and then the hub replicates the change to all other members. This topology is available only if the group has three or more servers.You can specify a primary hub and a secondary hub for each spoke member. With a secondary hub in place, if one of the hubs goes down, members are configured to recognize the secondary hub, and replication occurs with it. The two hubs synchronize with each other. This topology reduces the overall network load because the spokes don’t synchronize with each other, only with the hub. However, there can be a slight delay in propagating changes throughout the group because the hub must distribute all the changes.
  • With a full mesh topology, which is the default, synchronization is bidirectional, meaning all members synchronize with each other. It’s ideal when you have just a few servers. In a larger network with 10 or more members or when you have a main office connected to several branch offices, switching to a hub and spoke topology might be best to reduce network traffic. With large replication groups, the network load of communicating between all servers could become severe, depending on the replication schedule (discussed in the next section), the total number of files, the number of changes, and the overall size of files.
  • The “no topology” option is exactly what it sounds like: There are no initial connections, so you must define them. When would defining your own connections be useful? Say you have a central server where changes are being made and several other servers where you want files available locally, but they should be read-only. You could configure a hub and spoke topology, with the central server as the hub where files are updated and the other servers as spokes with read-only copies of files (which is an option when configuring a replication group). The hub synchronizes changes with the spokes, but it’s a one-way synchronization: The members never change the files, so they never replicate them back to the hub or to other members.

Scheduling replication

However, keep in mind that changes made on one end of a connection usually aren’t available to the other end until the next day. If the information is time sensitive, scheduling replication during off hours could cause problems because files might not be synchronized. The trade-off for the delay is more bandwidth available for other functions during peak hours.

Remote Differential Compression    

Copying the contents of hundreds or thousands of files across a network can waste bandwidth, especially when the amount of data that actu- ally changed is fairly small. DFS replication uses an algorithm known as remote differential compression (RDC), which replicates only the changes made in files. By default, RDC is used dur- ing replication. Because only pieces of files are transmitted across the network, the use of network bandwidth is reduced. The trade-off is increased CPU and disk I/O overhead on servers because they do extra work to update files with the replicated changes. When you have a good combi- nation of enough bandwidth and fewer files to synchronize, you might want to disable RDC. To configure RDC, follow these general steps:
1. In the DFS Management console, click the replication group in the left pane, and in the center pane, click the Connections tab.
2. Right-click a member server and click Properties. Next, clear the “Use remote differential compression (RDC)” check box, and then click Apply.

When you create replication group, In the Folders to Replicate window, click Add. In the Add Folder to Replicate dialog box, click Browse. In the Browse For Folder dialog box, click ReplShare, and then click OK. Any folder, whether it’s shared or not, can be in the replication group.

Staging folder and conflict

The Staging folder is where changed files are cached until they’re replicated; compression is performed on the sending server and decompression on the receiving server. By default, each replicated folder contains a hidden Staging folder: DFSRPrivate\Staging. The Staging folder’s size acts as a quota, and its default size is 4 GB. When the Staging folder reaches 90% of its defined size, the oldest staged files are deleted until it’s at 60%.

When a conflict occurs, DFS replication uses a “last writer wins” model to make this determination. The losing file is cached in the Conflict and Deleted folder, a hidden folder in the replicated folder named DFSRPrivate\ConflictandDeleted. Its default size is 4 GB. The log of the original names of files stored in this folder is written to the ConflictandDeletedManifest.xml file, which is also in the DFSRPrivate folder. Like the Staging folder, the Conflict and Deleted folder’s size can be changed, and the path can be changed to move the folder to another volume.

To manage these settings in the DFS Management console, select the replication group. In the Memberships tab, open the properties for the replicated folder and replication member you want to change. The Staging folder’s settings can be changed in the Staging tab (see Figure 3-15), and the Conflict and Deleted folder can be changed in the Advanced tab.

Configure fault tolerance and loading balance

To configure fault tolerance and load balancing, create identical folders on at least two servers and share them. Add the folders to an existing replication group or create a replication group for this purpose. Replicating the files ensures that you have an up-to-date copy on at least two servers for fault-tolerance and load-balancing purposes. The preferred topology for this replication group is full mesh because it makes sure all copies of files in the replication folders are consistent so that users aren’t using outdated files. Using a hub and spoke topology might cause delays in the replication process that could result in inconsistent file contents when a server fails. Finally, create a DFS namespace that includes targets of all folders in the replication group.

cloning and recovering DFS replication database

New feature in Windows Server 2012 R2.

  • Importing a clone of the replication database can substantially reduce this synchronization time—up to 99%, depending on the number of changes to the database that occur between exporting the clone and importing it. To create a clone, use the Export-DfsrClone command at a PowerShell prompt:
    Export-DfsrClone -Volume D: -Path D:\DFSRclone

In this command, D: is the drive letter of the volume containing the DFS database you want to export, and D:\DFSRclone is the destination folder the exported replication database files are written to. You then copy the folder with the exported files to a destination server for import.

  • Next, you use the following command at a PowerShell prompt to import the clone:
    Import-DfsrClone -Volume D: -Path D:\DFSRclone

 

A few factors to take into account when recovering a replication database with a clone:
• Make sure there’s no replicated folder on the destination volume. You can’t merge a clone with an existing replication database.

• Make sure there’s no write access to shares on the destination replication folders.

• Remove the destination server from the affected replication group before importing the clone.