One : file system

1. What is a file system ?

The software organization responsible for managing and storing file information in the operating system is called file management system , File system for short .

Usually, a file system is a mechanism for storing and organizing files , Easy to find and access the file .

File system is to organize and allocate the space of file storage device , A system that stores and protects and retrieves stored files .

It's responsible for creating files for users , Deposit in 、 read out 、 modify 、 Dump files , Control access to files , When the user is no longer using it, cancel the file, etc .

With more kinds of files , Expanded more file systems , In order to manage and organize all kinds of file systems .

2. Linux file system

Linux Divide the file system into two layers :VFS( Virtual file system )、 Specific file system , As shown in the figure below :
VFS(Virtual Filesystem Switch) It's called virtual file system or virtual file system transformation , It's a kernel software layer , A layer of abstraction above the concrete file system , Used to deal with Posix All calls related to the file system , It can provide a common interface for various file systems , So that the upper application can use a common interface to access different file systems , It also provides a medium for communication between different file systems .

VFS It's not a real file system , It only exists in memory , There is no external memory ,VFS Set up at system startup , Die when the system is shut down .

VFS from Superblock 、inode、dentry、vfsmount And so on .

Linux There are many file systems in the system , For example, common **ext2,ext3,ext4,sysfs,rootfs,proc…** wait .

Two 、VFS

1. VFS stay linux Location in Architecture

From the user's point of view ,Linux In the file system under, there are three layers :

  • 1. The system call of the upper file system (System-call );
  • 2. Virtual file system VFS(Virtual File System) layer ,
  • 3. Mount to VFS Various actual file systems in .

VFS Throughout Linux The architecture view of the system is as follows :


Linux Systematic User Use GLIBC(POSIX standard 、GUN C Runtime library ) As a runtime library for applications , And then through the operating system , Convert it to a system call SCI(system-call interface),SCI Is the system call interface defined by the operating system kernel , This layer of abstraction allows the user program to I/O Operations are converted to kernel interface calls .

2. How to process files transparently ?

We know that each file system is independent , It has its own way of organizing , Operation method . So for users , It's impossible for all file systems to understand , So how to let users process files transparently ?

for example : I want to write a document , Then directly read Just OK, Whatever file system you are , How to read it ! Here we need to introduce a virtual file system .

So the virtual file system is : For one system, There can be multiple “ The actual file system ”, for example :ext2,ext3,fat32,ntfs… For example, I now have multiple partitions , For each partition, we know it can be different “ The actual file system ”.

For example, now the three disk partitions are :ext2,ext3,fat32, Then each “ The actual file system ” The operation and data structure are definitely different , that , How can users use them transparently ?

This is the time VFS As the middle layer ! Users directly communicate with VFS Dealing with .

VFS It's a software mechanism , Only in memory , During each system initialization Linux Will first construct a tree in memory VFS Directory tree for ( That is, in the source code namespace).

VFS The main function is to shield different calling methods at the bottom from the upper application , Provide a unified calling interface , Second, it is convenient to organize and manage different file systems .

VFS Provides a layer of abstraction , take POSIX API The interface is separated from the specific interface implementation of different storage devices , Make the underlying file system type 、 The device type is transparent to the upper application .

for example read,write, So mapping to VFS The middle is sys_read,sys_write, that VFS It depends on which one you operate “ The actual file system ”( Which partition ) To do different practical operations ! This technology is also very familiar “ Hook structure ” Technology to deal with .

In fact, that is VFS Provides an abstract struct Structure , Then, for each specific file system, fill in its own fields and functions , This solves the problem of heterogeneity ( This mechanism is widely used in many subsystems of the kernel ).

3、 ... and 、Linux Four objects of virtual file system

In order to manage and organize the file system ,Linux Created a common root and global file system tree . To access files in a file system , You must first mount the file system in a root directory of the global file system tree , This mounting process is called mounting of the file system , The mounted directory is called the mount point .

The layout of a traditional file system on disk is as follows :

It can be seen from the above figure , The beginning of a file system is usually a boot block consisting of a disk sector , The main purpose of this part is to boot the operating system . Generally, it is only used when starting the operating system .

And then there's the superblock , The superblock mainly stores the information about the file system structure in the physical disk , And the size of each part is explained .

Finally by i Node bitmap , Logical block bitmap 、i node 、 These parts of the logical block are distributed on the physical disk .

Linux For the super block ,i node , The three parts of logic block are managed efficiently ,Linux Created several different data structures , Namely File system type 、inode、dentry Several kinds .

among , File system types dictate the behavior of a file system , The data structure can be used to construct an instance of a file system type , in addition , This instance is also called a superblock instance .

Superblocks reflect the overall control information of the file system . Superblocks can exist in many ways , For disk based file systems , It exists in a specific format in a fixed area of the disk ( Depending on the file system type ) On . When mounting a file system , The contents of the superblock are read into disk , So we can build a new super block in memory .

inode It reflects the general metadata information in the file system object .dentry It reflects the position of a file system object in the global file system tree .

Linux The four data structures are correlated .
Here's the picture :

 Structure relation

1. Superblock (super block)

Superblock : A superblock corresponds to a file system ( The type of file system that has been installed, such as ext2, Here's the actual file system , No VFS).

We've talked about the data formats and operations that file systems use to manage these files , System file has its own file system , At the same time, there are different file systems for different disk partitions . So a superblock for a separate file system . Save the type of file system 、 size 、 Status and so on .

(“ file system ” and “ File system type ” Dissimilarity ! A file system type can include many file systems, that is, many super_block)

Now that we know that there are different versions for different file systems super_block, So for different super_block It must be different , So we're down there super_block In the structure, we can see the above abstract struct structure ( For example, the following :struct super_operations):

(linux kernel 3.14)

1246 struct super_block {1247     struct list_head    s_list;     /* Keep this first */1248     dev_t           s_dev;      /* search index; _not_ kdev_t */                                                                                                                                                                                                                                                                                                                                                                                      1249     unsigned char       s_blocksize_bits;1250     unsigned long       s_blocksize;1251     loff_t          s_maxbytes; /* Max file size */1252     struct file_system_type *s_type;1253     const struct super_operations   *s_op;1254     const struct dquot_operations   *dq_op;1255     const struct quotactl_ops   *s_qcop;1256     const struct export_operations *s_export_op;1257     unsigned long       s_flags;1258     unsigned long       s_magic;1259     struct dentry       *s_root;1260     struct rw_semaphore s_umount;1261     int         s_count;1262     atomic_t        s_active;1263 #ifdef CONFIG_SECURITY1264     void                    *s_security;1265 #endif1266     const struct xattr_handler **s_xattr;1267 1268     struct list_head    s_inodes;   /* all inodes */1269     struct hlist_bl_head    s_anon;     /* anonymous dentries for (nfs) exporting */1270     struct list_head    s_mounts;   /* list of mounts; _not_ for fs use */1271     struct block_device *s_bdev;1272     struct backing_dev_info *s_bdi;1273     struct mtd_info     *s_mtd;1274     struct hlist_node   s_instances;1275     struct quota_info   s_dquot;    /* Diskquota specific options */1276 1277     struct sb_writers   s_writers;1278 1279     char s_id[32];              /* Informational name */1280     u8 s_uuid[16];              /* UUID */1281 1282     void            *s_fs_info; /* Filesystem private info */1283     unsigned int        s_max_links;1284     fmode_t         s_mode;1285 1286     /* Granularity of c/m/atime in ns.
1287        Cannot be worse than a second */1288     u32        s_time_gran;1289 1290     /*
1291      * The next field is for VFS *only*. No filesystems have any business
1292      * even looking at it. You had been warned.
1293      */1294     struct mutex s_vfs_rename_mutex;    /* Kludge */1295 1296     /*
1297      * Filesystem subtype.  If non-empty the filesystem type field
1298      * in /proc/mounts will be "type.subtype"
1299      */1300     char *s_subtype;1301 1302     /*
1303      * Saved mount options for lazy filesystems using
1304      * generic_show_options()
1305      */1306     char __rcu *s_options;1307     const struct dentry_operations *s_d_op; /* default d_op for dentries */1308 1309     /*
1310      * Saved pool identifier for cleancache (-1 means none)
1311      */1312     int cleancache_poolid;1313 1314     struct shrinker s_shrink;   /* per-sb shrinker handle */1315 1316     /* Number of inodes with nlink == 0 but still referenced */1317     atomic_long_t s_remove_count;1318 1319     /* Being remounted read-only */1320     int s_readonly_remount;1321 1322     /* AIO completions deferred from interrupt context */1323     struct workqueue_struct *s_dio_done_wq;1324 1325     /*
1326      * Keep the lru lists last in the structure so they always sit on their
1327      * own individual cachelines.
1328      */1329     struct list_lru     s_dentry_lru ____cacheline_aligned_in_smp;1330     struct list_lru     s_inode_lru ____cacheline_aligned_in_smp;1331     struct rcu_head     rcu;1332 };

Explain fields :

Field describe
s_list Pointer to the super block list , This struct list_head It's a familiar structure , It's actually used to connect relationships prev and next Field . The kernel uses a simple structure to separate all the super_block All linked up .
s_dev The block device identifier containing the specific file system . for example , about /dev/hda1, Its device identifier is 0x301
s_blocksize_bits above size Size takes up bits , for example 512 Byte is 9 bits
s_blocksize Block size in file system , In bytes
s_maxbytes Maximum file size allowed ( Number of bytes )
*struct file_system_type s_type File system type ( Which type does the current file system belong to ?ext2 still fat32), Distinguish “ file system ” and “ File system type ” Dissimilarity ! A file system type can include many file systems, that is, many super_block, I'll say later !
*struct super_operations s_op A collection of functions for superblock operations that point to a specific file system
*struct dquot_operations dq_op A collection of functions that point to a specific file system for quota operations
*struct quotactl_ops s_qcop Methods for configuring disk quotas , Processing requests from user space
s_flags Installation logo
s_magic Identification different from other file systems
s_root Directory entry pointing to the specific file system installation directory
s_umount Synchronization of super block reading and writing
s_count Use count of superblocks
s_active Reference count

The superblock method

 struct super_operations {// This function creates and initializes a new index node object under a given superblock
    struct inode *(*alloc_inode)(struct super_block *sb);// Release the specified index node  .void (*destroy_inode)(struct inode *);//VFS This function is called when the index node is modified .
    void (*dirty_inode) (struct inode *, int flags);//  Specifies the inode Write back to disk .int (*write_inode) (struct inode *, struct writeback_control *wbc);// Delete index nodes .int (*drop_inode) (struct inode *);void (*evict_inode) (struct inode *);// To release the superblock void (*put_super) (struct super_block *);// Synchronize the data elements of the file system with the file system on disk ,wait Parameter specifies whether the operation is synchronized .int (*sync_fs)(struct super_block *sb, int wait);int (*freeze_fs) (struct super_block *);int (*unfreeze_fs) (struct super_block *);// Get file system status . Put the file system statistics in statfs in int (*statfs) (struct dentry *, struct kstatfs *);int (*remount_fs) (struct super_block *, int *, char *);void (*umount_begin) (struct super_block *);
int (*show_options)(struct seq_file *, struct dentry *);int (*show_devname)(struct seq_file *, struct dentry *);int (*show_path)(struct seq_file *, struct dentry *);int (*show_stats)(struct seq_file *, struct dentry *);#ifdef CONFIG_QUOTA
ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);#endifint (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);long (*nr_cached_objects)(struct super_block *, int);long (*free_cached_objects)(struct super_block *, long, int);};

2. The index node (inode)

The index node inode:
What is stored is actually some information of the actual data , This information is called “ Metadata ”( That is, the description of file attributes ).

for example : file size , Device identifier , User identifier , User group identifier , File mode , Extended attributes , Time stamp of file read or modified , Number of links , Pointer to the disk block where the content is stored , File classification and so on .

( Notice that the data is divided into : Metadata + Data itself )

At the same time pay attention to :inode There are two kinds of , One is VFS Of inode, One is specific file system inode. The former is in memory , The latter is on disk . So each time it's actually putting the inode Call into fill memory inode, This is the use of disk files inode.

inode How to generate ?

Every inode The size of the node , It's usually 128 Byte or 256 byte .inode The total number of nodes , Given when formatting ( modern OS Can change dynamically ), In general 2KB Just set up a inode.

In general file system, few files are smaller than 2KB Of , So it's scheduled according to 2KB branch , commonly inode It's endless . therefore inode There will be a default number when the file system is installed , Later, it will change according to the actual needs .

Be careful inode Number :inode The number is the only one , Represents a different document . Actually in Linux On the inside , Access to files is through inode It's going to take place on the 7th , The so-called file name is just for users to use easily .

When we open a file , First , The system finds the file name corresponding to inode Number ; then , adopt inode Number , obtain inode Information , Last , from inode Find the file data block, Now you can process the file data .

inode The relationship with documents ?

When creating a file , We assigned one to the file inode. One inode Only one actual file , There will be only one file inode.inodes The maximum number is the maximum number of files .

527 struct inode {
 528     umode_t         i_mode;  /*  Access control   */
 529     unsigned short      i_opflags;
 530     kuid_t          i_uid;  /*  Of the user id */
 531     kgid_t          i_gid;   /*  Working with groups id  */
 532     unsigned int        i_flags; /*  File system flags  */
 534 #ifdef CONFIG_FS_POSIX_ACL 535     struct posix_acl    *i_acl;
 536     struct posix_acl    *i_default_acl;
 537 #endif 538 
 539     const struct inode_operations   *i_op; /* Index node operation table */
 540     struct super_block  *i_sb;             /*  Related superblocks   */
 541     struct address_space    *i_mapping;   /*  Related address mapping  */
 543 #ifdef CONFIG_SECURITY 544     void            *i_security;
 545 #endif 546 
 547     /* Stat data, not accessed from path walking */
 548     unsigned long       i_ino;   /*  Inode number  */
 549     /*
 550      * Filesystems may only read i_nlink directly.  They shall use the
 551      * following functions for modification:
 552      *
 553      *    (set|clear|inc|drop)_nlink
 554      *    inode_(inc|dec)_link_count
 555      */
 556     union {
 557         const unsigned int i_nlink;
 558         unsigned int __i_nlink;   /*  Number of hard connections  */
 559     };
 560     dev_t           i_rdev;  /*  The actual device identifier  */
 561     loff_t          i_size;
 562     struct timespec     i_atime; /*  Last access time  */
 563     struct timespec     i_mtime; /*  Last modified  */
 564     struct timespec     i_ctime; /*  Finally, change the time   */
 565     spinlock_t      i_lock; /* i_blocks, i_bytes, maybe i_size */
 566     unsigned short          i_bytes;  /*  Number of bytes used  */
 567     unsigned int        i_blkbits;
 568     blkcnt_t        i_blocks;  /*  The number of blocks in the file  */
 570 #ifdef __NEED_I_SIZE_ORDERED 571     seqcount_t      i_size_seqcount;
 572 #endif573 
 574     /* Misc */
 575     unsigned long       i_state;
 576     struct mutex        i_mutex;
 578     unsigned long       dirtied_when;   /* jiffies of first dirtying  The first modification time */
 580     struct hlist_node   i_hash;   /*  hash value , Improve search efficiency  */
 581     struct list_head    i_wb_list;  /* backing dev IO list */
 582     struct list_head    i_lru;      /* inode LRU list  That is not used inode*/
 583     struct list_head    i_sb_list; /*  Link all... In a file system inode The linked list of  */
 584     union {
 585         struct hlist_head   i_dentry;  /*  Catalog, necklace list   */
 586         struct rcu_head     i_rcu;
 587     };
 588     u64         i_version;
 589     atomic_t        i_count;    /*  Reference count  */
 590     atomic_t        i_dio_count;
 591     atomic_t        i_writecount;   /*  The writer count  */
 592     const struct file_operations    *i_fop; /* former ->i_op->default_file_ops  File operations */
 593     struct file_lock    *i_flock;   /*  File chain list  */
 594     struct address_space    i_data; /*  Said by inode Read and write pages  */
 595 #ifdef CONFIG_QUOTA 596     struct dquot        *i_dquot[MAXQUOTAS];/*  Disk quota for node  */
 597 #endif 598     struct list_head    i_devices;  /*  Device list ( A list of devices that share the same driver .) */
 599     union {
 600         struct pipe_inode_info  *i_pipe; /*  Pipeline information  */
 601         struct block_device *i_bdev; /*  Block device driver node  */
 602         struct cdev     *i_cdev;   /*  Character device driver node  */
 603     };
 605     __u32           i_generation;    /*  Inode version number  */
 607 #ifdef CONFIG_FSNOTIFY 608     __u32           i_fsnotify_mask; /* all events this inode cares about */
 609     struct hlist_head   i_fsnotify_marks;
 610 #endif 611 
 612 #ifdef CONFIG_IMA 613     atomic_t        i_readcount; /* struct files open RO */
 614 #endif 615     void            *i_private; /* fs or device private pointer  User private data */
 616 };

Pay attention to management inode Four linked lists of :

static struct hlist_head *inode_hashtable __read_mostly;

Node method

struct inode_operations {struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);void * (*follow_link) (struct dentry *, struct nameidata *);int (*permission) (struct inode *, int);struct posix_acl * (*get_acl)(struct inode *, int);
int (*readlink) (struct dentry *, char __user *,int);void (*put_link) (struct dentry *, struct nameidata *, void *);
int (*create) (struct inode *,struct dentry *, umode_t, bool);int (*link) (struct dentry *,struct inode *,struct dentry *);int (*unlink) (struct inode *,struct dentry *);int (*symlink) (struct inode *,struct dentry *,const char *);int (*mkdir) (struct inode *,struct dentry *,umode_t);int (*rmdir) (struct inode *,struct dentry *);int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t);int (*rename) (struct inode *, struct dentry *,struct inode *, struct dentry *);int (*rename2) (struct inode *, struct dentry *,struct inode *, struct dentry *, unsigned int);int (*setattr) (struct dentry *, struct iattr *);int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
ssize_t (*listxattr) (struct dentry *, char *, size_t);int (*removexattr) (struct dentry *, const char *);int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,  u64 len);int (*update_time)(struct inode *, struct timespec *, int);int (*atomic_open)(struct inode *, struct dentry *,
   struct file *, unsigned open_flag,
   umode_t create_mode, int *opened);int (*tmpfile) (struct inode *, struct dentry *, umode_t);int (*set_acl)(struct inode *, struct posix_acl *, int);} ____cacheline_aligned;

Some important results are analyzed :

Method meaning
create() If it's time to inode Describe a catalog file , So when you create or open a file in this directory , The kernel must create a inode.VFS By calling the inode Of i_op->create() Function to complete the new inode The creation of . The first parameter of this function is the inode, The second parameter is to open a new file dentry, The third parameter is access to the file . If it's time to inode It describes a common file , Then the inode Never call this create function ;
lookup() Find... For the specified file dentry;
link() Used to create a hard link in the specified directory . This link Functions will eventually be called by the system link() call . The first argument to this function is the... Of the original file dentry, The second parameter is the inode, The third parameter is the link file dentry.
unlink () Delete the specified hard link in a directory . This unlink Functions will eventually be called by the system unlink() call . The first parameter is the inode, The second parameter is to delete the file dentry.
symlink () Create a new... In a directory
mkdir() Create a subdirectory under the specified directory , Current directory inode Would call i_op->mkdir(). This function will be called by the system mkdir() call . The first parameter is the inode, The second parameter is the dentry, The third parameter is subdirectory permissions ;
rmdir () from inode When a specified subdirectory is deleted from the directory described , This function will be called by the system rmdir() The final call ;
mknod() Create a special file in the specified directory , Like pipes 、 Device files or sockets, etc .

3) Catalog items (dentry)

A directory entry is a logical property that describes a file , Only in memory , There is no actual description on the corresponding disk , More specifically, the cache of directory entries in memory , Designed to improve search performance .

Notice whether it's the folder or the final file , All of them belong to catalog items , All the entries together form a huge directory tree .

for example :open A file /home/xxx/yyy.txt, that /、home、xxx、yyy.txt It's all a catalog entry ,VFS When searching , According to the level by level directory entries, find the corresponding... Of each directory entry inode, Then follow the directory entry to find the final file .

Be careful : A directory is also a kind of file ( So there is a corresponding inode). Open Directory , It's actually opening a directory file .

108 struct dentry {109     /* RCU lookup touched fields */110     unsigned int d_flags;       /* protected by d_lock */111     seqcount_t d_seq;       /* per dentry seqlock */112     struct hlist_bl_node d_hash;    /* lookup hash list */113     struct dentry *d_parent;    /* parent directory  Parent directory */114     struct qstr d_name;115     struct inode *d_inode;      /* Where the name belongs to - NULL is
116                      * negative  The inode*/117     unsigned char d_iname[DNAME_INLINE_LEN];    /* small names  Short filename */118 119     /* Ref lookup also touches following */120     struct lockref d_lockref;   /* per-dentry lock and refcount */121     const struct dentry_operations *d_op;  /*  Catalog item operation  */122     struct super_block *d_sb;   /* The root of the dentry tree  The superblock of the file system to which this directory entry belongs ( The root of the directory tree )*/123     unsigned long d_time;       /* used by d_revalidate  Re effective time */124     void *d_fsdata;         /* fs-specific data  Specific file system data  */125 126     struct list_head d_lru;     /* LRU list  Directory is not used to LRU  Algorithm linked list  */127     /*
128      * d_child and d_rcu can share memory
129      */130     union {131         struct list_head d_child;   /* child of parent list  Directory entries are added to the parent directory through d_subdirs in */132         struct rcu_head d_rcu;133     } d_u;134     struct list_head d_subdirs; /* our children  All child directory chain headers of this directory  */135     struct hlist_node d_alias;  /* inode alias list  Index node alias list */136 };

A valid dentry There must be a structure inode structure , This is because a directory entry either represents a file , Or it represents a directory , And a directory is actually a file . therefore , as long as dentry Structure is effective , Then its pointer d_inode It must point to a inode structure . however inode But it can correspond to multiple .

The whole structure is actually a tree , If you've seen my device model kobject You know , A directory is a file (kobject、inode) Plus a layer of packaging , The so-called encapsulation here is to add two pointers , One is pointing to the parent directory , One is to point to all the files contained in the directory ( Ordinary files and directories ) The chain head of .

So that we can have our directory operation ( Like going back to the last catalog , Just one pointer step 【…】, Entering a subdirectory requires a linked list, and indexing requires multiple steps )

dentry Related operations (inode It already contains mkdir,rmdir,mknod And so on )

struct dentry_operations {/*  This function determines whether the directory object is valid .VFS Ready to go from dcache When using a catalog entry in , This function will be called . */int (*d_revalidate)(struct dentry *, unsigned int);       
int (*d_weak_revalidate)(struct dentry *, unsigned int);/*  This directory generates hash values , When directory entries are added to the hash table ,VFS To call this function . */int (*d_hash)(const struct dentry *, struct qstr *);    /*  This function compares name1 and name2 These two file names . Use this function to add dcache_lock lock . */int (*d_compare)(const struct dentry *, const struct dentry *,unsigned int, const char *, const struct qstr *);/*  When d_count=0 when ,VFS Call the secondary function . Using this function is called  dcache_lock lock . */int (*d_delete)(const struct dentry *);/*  When the directory object is about to be released ,VFS Call this function . */void (*d_release)(struct dentry *);void (*d_prune)(struct dentry *);/*  When a directory entry loses its index node ,VFS Just drop the function . */void (*d_iput)(struct dentry *, struct inode *);char *(*d_dname)(struct dentry *, char *, int);struct vfsmount *(*d_automount)(struct path *);int (*d_manage)(struct dentry *, bool);} ____cacheline_aligned;

4) File object (file)

A file object describes a file that has been opened by a process . Because a file can be opened by multiple processes , So a file can have multiple file objects . But because the file is unique , that inode It's the only one , Directory entries are also fixed !

A process actually operates a file through a file descriptor , Each file has a 32 The number of bits to indicate the position of the next read and write byte , This number is called file location .

Generally, after opening a file , All open positions are from 0 Start , Except for some special circumstances .Linux use file Structure to save the location of the open file , therefore file be called Open file description .file The structure forms a double linked list , Called system open file table .


775 struct file {
 776     union {
 777         struct llist_node   fu_llist; /*  Every open file in the file system forms a double linked list  */
 778         struct rcu_head     fu_rcuhead;
 779     } f_u;
 780     struct path     f_path;
 781 #define f_dentry    f_path.dentry 782     struct inode        *f_inode;   /* cached value */
 783     const struct file_operations    *f_op; /*  Pointer to file operation table  */
 785     /*
 786      * Protects f_ep_links, f_flags.
 787      * Must not be taken from IRQ context.
 788      */
 789     spinlock_t      f_lock;
 790     atomic_long_t       f_count;  /*  Usage count of file objects  */
 791     unsigned int        f_flags;  /*  Flag specified when opening a file  */
 792     fmode_t         f_mode;       /*  File access mode ( Authority, etc ) */
 793     struct mutex        f_pos_lock;
 794     loff_t          f_pos;       /*  The current displacement of the file  */
 795     struct fown_struct  f_owner;
 796     const struct cred   *f_cred;
 797     struct file_ra_state    f_ra; /*  Preview status  */
 799     u64         f_version;   /*  Version number  */
 800 #ifdef CONFIG_SECURITY 801     void            *f_security;  /*  Security module  */
 802 #endif 803     /* needed for tty driver, and maybe others */
 804     void            *private_data; /*  Private data  */
 806 #ifdef CONFIG_EPOLL 807     /* Used by fs/eventpoll.c to link all the hooks to this file */
 808     struct list_head    f_ep_links;
 809     struct list_head    f_tfile_llink;
 810 #endif /* #ifdef CONFIG_EPOLL */
 811     struct address_space    *f_mapping;/*  Page cache mapping  */
 812 #ifdef CONFIG_DEBUG_WRITECOUNT 813     unsigned long f_mnt_write_state;
 814 #endif 815 } __attribute__((aligned(4)));  /* lest something weird decides that 2 is OK */

Focus on some important fields :

  1. First ,f_flags、f_mode and f_pos Represents the control information of the current operation of this file by this process . This is very important , Because for a file , Can be opened by multiple processes at the same time , So for each process , The operation of this file is asynchronous , So these three fields are very important .
  2. For reference counting f_count, When we close a file descriptor of a process , It's not really closing a file , Just will f_count Minus one , When f_count=0 When , To actually shut it down . about dup,fork For these operations , Will make f_count increase , Specific details , later .
  3. f_op It's also very important ! Is an operation structure involving all files . for example : The user to use read, Will eventually call file_operations Read operations in , and file_operations Structure is not necessarily the same for different file systems . It's an important operation function release function , When the user executes close When , In fact, it's execution in the kernel release function , This function will just f_count Minus one , This explains the above , user close A file is actually going to f_count Minus one . Only the reference count is reduced to 0 Just close the file .

Be careful : about “ Being used ” and “ not used ” File objects are managed by a two-way linked list .


above file Just for a file , For a process ( user ) Come on , Multiple files can be processed at the same time , So another structure is needed to manage all of the files!

namely : The user opens the file table —>files_struct

172 struct files_struct {173         atomic_t count;174         rwlock_t file_lock;     /* Protects all the below members.  Nests inside tsk->alloc_lock */175         int max_fds;176         int max_fdset;177         int next_fd;178         struct file ** fd;      /* current fd array */179         fd_set *close_on_exec;180         fd_set *open_fds;181         fd_set close_on_exec_init;182         fd_set open_fds_init;183         struct file * fd_array[NR_OPEN_DEFAULT];184 };

Explain some fields :

Field describe
count Reference count
file_lock lock , Protect the following fields
max_fds The maximum number of current file objects
max_fdset Maximum number of file descriptors
next_fd The largest file descriptor allocated +1
fd Pointer to file object pointer array , It usually points to the last field fd_arrray, When the number of files exceeds NR_OPEN_DEFAULT When , An array will be reassigned , And then point to this new array pointer !
close_on_exec perform exec() The file descriptor that needs to be closed
open_fds Pointer to the open file descriptor
close_on_exec_init perform exec() The initialization value of the file descriptor that needs to be closed
open_fds_init File descriptor initial value set
fd_array Initialization array of file object pointer


above file and files_struct It records information about files related to the process , But for the process itself , What are some of your own information expressed in , This is about fs_struct Structure .

  5 struct fs_struct {
  6         atomic_t count;
  7         rwlock_t lock;
  8         int umask;
  9         struct dentry * root, * pwd, * altroot;
 10         struct vfsmount * rootmnt, * pwdmnt, * altrootmnt;
 11 };

Explain some fields :

Field describe
count Reference count
lock Protection lock
umask Default file access when opening a file
root The root directory of the process
pwd The current execution directory of the process
altroot The replacement root set by the user

Be careful : In actual operation , These three directories may not all be in the same file system . for example , The root of a process is usually installed in “/” nodes ext file system , The current working directory may be installed in /etc A file system for , The root directory can also be replaced in different file systems .
rootmnt,pwdmnt,altrootmnt: Corresponding to the above three mounting points .

Document method ( operation )file_operations

struct file_operations {struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
ssize_t (*read_iter) (struct kiocb *, struct iov_iter *);
ssize_t (*write_iter) (struct kiocb *, struct iov_iter *);int (*iterate) (struct file *, struct dir_context *);unsigned int (*poll) (struct file *, struct poll_table_struct *);long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);long (*compat_ioctl) (struct file *, unsigned int, unsigned long);int (*mmap) (struct file *, struct vm_area_struct *);int (*open) (struct inode *, struct file *);int (*flush) (struct file *, fl_owner_t id);int (*release) (struct inode *, struct file *);int (*fsync) (struct file *, loff_t, loff_t, int datasync);int (*aio_fsync) (struct kiocb *, int datasync);int (*fasync) (int, struct file *, int);int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);int (*check_flags)(int);int (*flock) (struct file *, int, struct file_lock *);
ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);int (*setlease)(struct file *, long, struct file_lock **);long (*fallocate)(struct file *file, int mode, loff_t offset,
  loff_t len);int (*show_fdinfo)(struct seq_file *m, struct file *f);};

The above one should be most familiar to our driver developers , It's also something we have to master .

Field describe
owner Used to specify the module that owns the file operation structure , Usually take THIS_MODULE;
llseek Used to set the offset of the file . The first parameter indicates the file to be operated on , The second parameter is the offset , The third parameter is where the offset starts ( Take SEEK_SET,SEEK_CUR and SEEK_END One of ).
read Reading data from a file . The first parameter is the source file , The second parameter is the destination string , The third parameter indicates the total number of bytes of data to be read , The fourth parameter indicates that the data is read from an offset in the source file . Called by system read() call ;
write Write data to a file . The first parameter is the destination file , The second parameter is the source string , The third parameter indicates the total number of bytes of data to be written , The fourth parameter indicates to write data from an offset of the destination file . Called by system write() call ;
mmap Map the specified file to the specified address space . Called by system mmap() call ;
open Open the specified file , And associate this file with the specified index node . Called by system open() call ;
release Release to open the file , When the reference count of the open file (f_count) by 0 when , The function is called ;
fsync() The file writes the buffered data back to disk ;

Four 、 The relationship between process and the four

The structure used to manage processes in the kernel is task_struct.
Process open file is related to the above 4 An important data structure :


Each process has its own namespace.

fs_struct Used to represent the structural relationship between a process and a file system , Like the current working directory , The root of the process, etc .

files_struct Used to represent the file opened by the current process .

And for every open file , from file Object to represent .

Linux in , File descriptors are often used (file descriptor) To represent an open file , The value of this descriptor is always greater than or equal to 0 The integer of .
And this integer , In fact, in the files_struct in file Array fd The subscript .
For all open files , These file descriptors are stored in open_fds In the bitmap of .

 Processes and superblocks 、 file 、 Index node 、 The relationship between directory entries

It can be seen from the picture that :

  1. Process passing task_struct A domain in files->files_struct To find out which file object it is currently opening ; The file descriptor we usually call is actually the index value of the file object array opened by the process .
  2. The file object passes through the domain f_dentry Find the corresponding dentry object , Again by dentry The domain of the object d_inode Find its corresponding index node ( Through the index node, we can get the super block information , You can get the final operation file , stay open This is the process that is used when writing documents ), This establishes an association between the file object and the actual physical file .
  3. The file operation function list corresponding to the file object is through the field of the index node i_fop Got , and i_fop Finally, through struct super_operations *s_op To initialize .

VFS In the file system inode and dentry Compared with the actual file system inode and dentry There is a certain relationship , But not the same as .

Real disk file inode and dentry It exists in physical external memory , but VFS Medium inode and dentry It's in memory , The system reads... From external memory inode and dentry After the information is processed , Generate... In memory inode and dentry.

Virtual file systems also have inode and dentry structure , It's just that this is generated by the system according to the corresponding rules , Does not exist in actual external storage .

5、 ... and 、 Disk and file system

Suppose a disk is divided into several partitions , Each partition is a different file system .
 Disk and file system

Reference article :csdn- Knife knife
Infringement and deletion