Saturday, November 27, 2010

Debunking UNIX file links myths

This may be a pretty newbie topic for some, but yes, I had only debunked this myth today. So what's the difference between a symbolic link and a hard link? I realized that hard links are well, hard. It's not lame, it's just how you'd describe hardware as "hardware". Hard links are very low level on the filesystem level. Every file that you see on an ls output are hard links. Hard links are just pointers to a particular location in the filesystem which... houses the data. Therefore, when you hard link to a hard link with a command like 'ln -T /usr/bin/firefox /usr/bin/firefox-browser', both firefox and firefox-browser will point to the same location on the filesystem, even though to an ls output, it looks as if they are two different files. Yes, they just point to the same inode on the filesystem. The nature of hard links is that well, we can save some inodes just in an extreme case we are running out of inodes(ie, we cannot create a new file even though there are unused space). However, I feel that hard links are kinda messy because, a file is only deleted when there are no pointers to its inode on the filesystem, and should only be used in certain cases. For example, we hard-link the firefox executable with 'ln -T /usr/bin/firefox /usr/bin/firefox-browser', because we want to maintain compatibility for an application which uses the firefox binary by calling firefox-browser instead of firefox. Then somehow, the firefox executable gets updated, say for example, by the operating system's package manager. Can you see what is wrong now? The firefox-browser hard-link would still be pointing to the old firefox executable inode! The program which depend on it may behave unexpectedly, because it is using an older version of the executable. Remember, the data is never deleted until all hardlinks to it is deleted. So in this particular scenario, using a soft-link aka symbolic link is a lot safer. It is because symbolic links on the other hand, is a file on its own, but it records the high level information about where the link should be pointing to(ie, for example, the location of the file in the filesystem hierachy). Therefore, if we create the same link of the symbolic type instead for the above scenario, the link would automatically update to be pointing to the new executable instead. Okay, I typed all these out of rashness for this discovery, so correct me if I am in anyway wrong.

5 comments:

  1. Remember, the data is never deleted until all hardlinks to it is deleted.

    A Unix Guru will say you never delete a file, you remove it. Even it is neatier to say you unlink it from the file system (look at the source of rm shell command). ;)

    ReplyDelete
  2. I summarized my view of the various link types recently: http://www.pixelbeat.org/docs/unix_links.html

    ReplyDelete
  3. Thanks so much !
    I never realized they were so different. So what's the real use of a hard link ?

    ReplyDelete
  4. hardlinks are useful when you want a copy to remain after deleting a file. Efficient backup is an example, where you create a link to the data at another path. Then if you replace (unlink, not truncate!) the original, the linked data is still available.

    For example if you wanted to atomically replace file data, but also keep a copy, you could:

    cp -l -b -f file file
    $filter < file > file.tmp &&
    mv file.tmp file

    ReplyDelete
  5. @Skimeteo

    I use (well the backup program I use) hard links to create incremental backups.

    So to create a backup the old backup is copied (with hard links) to the new location. So now I have 2 (or more) backups with the complete directory structure on my backup disk.
    After the hard copy, I ran rsync with my home dir and the newly copied backup directory.
    (rsnapshot does this all automaticaly)

    So now I have 2 "complete" backups, but only the changed files are stored on disk twice.

    ReplyDelete