This document describes Grex's password database. A good password database is a key to maintaining good security on a public access system. For more information, see our technical note on Grex Security Goals.
We will describe our password database in a historical manner, starting by telling about the standard Unix password database that came with SunOS 4.1.4, then describing the JFH shadow package that we replaced it with, and finally describing our own modifications to that package.
Standard Unix Password Database
All Unix systems have a file named /etc/passwd which contains one line of information about each user. A typical passwd file name from a standard Unix system looks like this:
janc:DIBU7epiSd4xs:2386:10:Jan Wolter:/a/j/a/janc:/bin/csh
This contains seven fields, separated by colons:
- name - the user's login name.
- passwd - An encrypted version of the user's password.
- uid - The user's user id number (a unique number used by Unix to identify users).
- gid - The user's group id number (on Grex, 10 is staff, and 50 and 51 are regular users).
- gecos - This normally contains the user's full name, possibly with office number, home phone and office phone given after it. It's called "gecos" for weird historical reasons.
- dir - This is the full path of the user's home directory.
- shell - This is the full path of the user's login shell program.
Note that actual passwords are never stored anywhere. Only an encrypted copy of the password is stored. When you type in your password the login program encrypts whatever you typed in, and compares the resulting string to the one stored in the passwd file. If they match, you must have typed in the correct password. If they don't, you didn't.
Passwords are encrypted using a version of the US National Bureau of Standard's DES (Data Encryption Standard) algorithm. The Unix password encryption algorithm takes the lowest 7 bits of each of the first 8 characters of your password to construct a 56-bit key which is used to repeatedly encrypt a constant string (usually a string of all zeros) into a 14 character string like the one in the example above.
The first two characters of the encrypted password aren't really part of the encryption. They are a randomly selected "salt" which is different for each user and is used to permute the algorithm in small ways, resulting in the encrypted password that is stored as the other twelve characters. Thus, if two different users set their password to the same thing, you won't be able to tell because with different salt, their entire encrypted passwords will be different.
The /etc/passwd file, which normally contain these encrypted passwords, is always publicly readable on Unix systems, so any user can see any other user's encrypted password. Back in the seventies this wasn't a real problem. Knowing someone's encrypted password couldn't get you access to their account, and decrypting a password was simply too hard. However, computers have gotten much, much faster, and there exist programs these days that can do a pretty decent job of cracking passwords these days. With 56-bit keys, there are 2**56 or 72,057,594,037,927,936 different possible passwords. Modern parallel computers can search that in reasonable amounts of time, and widely available programs like crack can do a pretty good job of searching the portion of that key space usually used by humans - mostly words and names. (That's why Grex's password programs are pretty picky about encouraging "non-obvious" passwords - it lengthens the list crackers have to search.)
The other problem with the traditional Unix password system is one of access time. The original designers of Unix didn't expect Unix systems with tens of thousands of users. Grex has about 25,000 accounts, which means that the passwd file is about 25,000 lines long. Every time the system needs to know any of the information stored in there, it has to search that huge file. This can be very slow.
Newer versions of Unix deal with both of these problems. Encrypted passwords are no longer stored in /etc/passwd, though all the other information will still be there, and there may be an alternate hashed password database that can be searched much more efficiently. Unfortunately, Grex runs SunOS 4.1, which is not a newer version of Unix and doesn't support do either of these things. So we had to fix it.
The JFH Password Database
Many years ago Grex's staff (mainly Greg Cronau) replaced the SunOS password system with a public domain system developed by Julianne F. Haugh. The JFH password system does two things that we badly needed.
First, encrypted passwords are no longer stored in /etc/passwd. If you look at that file, you will see that the lines look like:
janc:x:2386:10:Jan Wolter:/a/j/a/janc:/bin/csh
Here the encrypted password has been replaced by an x. (Some Unix systems put fake passwords here instead of an x, but we don't want to encourage hopeful hackers to download the whole useless /etc/passwd file over our overburdened internet connection - though plenty of people try it anyway. Duh.) The actual encrypted passwords are stored in a different file, called /etc/shadow, which, unlike /etc/passwd, is not readable. If you could read it, you would see that it contains lines that look like this:
janc:ytx9yoTz0xTiYv3eIkk9XOpF:10008:0:365:30:::
The fields are:
- name - the user's login name.
- passwd - the user's encrypted password.
- lastchange - The date the user's password was last changed (given as number of days since Jan 1, 1970).
- min - The minimum number of days after a change that you have to wait to change it again (zero for all users on Grex).
- max - The maximum number of days a user may go without changing a password before the system forces a password change (365 for most users on Grex).
- warn - The number of days before password expiration that the user is warned (30 days for all users on Grex).
- inactive - The number of days after password expiration until the account is considered inactive and is expired (not used on Grex).
- expire - The date on which an account expires (not used on Grex).
Obviously this does more for us than just hide the encrypted password. It gives us mechanisms to force users to change their password from time to time and do other handy things. But hiding the password is the main goal.
The alert reader will have noticed another difference. In the sample shadow line above, the encrypted password is 26 characters long, rather than the standard 14. You guessed it - there have been changes to the password encryption algorithm.
As we mentioned, the DES algorithm uses a 56 bit key, constructed by taking 7 bits from the first 8 characters of your password. But what happens if your password is longer than 8 characters? On standard Unix systems, the extra characters aren't used. That's a waste of perfectly good password.
So in the JFH system, if your password is more than 8 characters long, the next eight characters are also encrypted by the same algorithm. So the result is 2 bytes of salt, 12 bytes encrypted from the first 8 characters of your password, and 12 more bytes encrypted from the rest of your password with the same salt. String them all together and you have 26 bytes of encrypted password, like the sample above.
The JFH password system also addresses the speed question. In addition to the the standard /etc/passwd file, there are the /etc/passwd.pag and /etc/passwd.dir files. These contain exactly the same data as the /etc/passwd file, and, in fact, are generated from the /etc/passwd file, but they comprise a hashed database, which can be searched very rapidly. (To search a hashed file, you take what you are searching for, in this case the name "janc", and use some fixed algorithm to turn it into a number (e.g., you might add up all the letters). Then you use this number to index straight into the database to find the object you want. This can be very fast, but requires that the database be properly structured to make it work.) There are similar /etc/shadow.pag and /etc/shadow.dir files to enable fast searching of the shadow database.
The catch, of course, is that none of the Unix programs that come with SunOS know that the hashed database exists. We have rebuilt the most commonly used programs to use the hashed database for password file lookups, but some programs still search straight through the flat /etc/passwd file. In most cases, this is just a matter of recompiling them with the -lshadow flag on the compile command to link in the shadow library, which includes new versions of the getpwnam() and getpwuid() calls that use the hashed database.
Some programs we don't have source for. For example, the csh we are running is the original SunOS version, which doesn't know about the hashed password database. So when you do "cd ~janc" in csh it is likely to be very slow, but if you do the same thing in tcsh, which we were able to recompile, it is very fast.
Grex's Password Database
The JFH shadow system had several problems, so over the years the Grex staff (mainly Marcus Watts) has been making changes to it. There have been a number of small bug fixes, and two more significant changes.
The first change was related to the maintenance of the hash files. In the JFH shadow system, and in many other Unix versions with a hashed password database, the entire hash file has to be rebuilt every time there is any change. If your password file is big enough to need a hashed database, then rebuilding that database is going to be a slow process. So while the hash system makes accesses very fast, updates become very slow.
The general presumption is that it doesn't matter so much because updates are rare. You only need to rebuild the hash files when you create or delete an account, or when there is a change to a user's name, password, or login shell. However, on Grex we typically have over a hundred new accounts created through the newuser program every day. With Grex as slow as it was at that time, it could take as long as fifteen minutes to do each rebuild, and only one can be running at a time, so it could take about 25 hours a day to do all the account creations. (These days our computer is much faster, but we now have over 200 accounts created per day, so it's still nice to be fast). This is definitely a problem.
The solution, of course, is not to rebuild the entire database every time one thing changes. Instead, you modify the existing database to make only the necessary changes. This is much, much faster, but difficult to do correctly. Marcus has developed versions of the newuser, passwd, chsh and chfn commands that do this, as well as a program to allow staff members to modify, create and destroy accounts.
The other change we have made is to the password encryption algorithm. The method used by the JFH software to handle long passwords isn't really all that good. Those 26-character long encryptions aren't as formidable as they look. Though many users have passwords longer than 8 characters, they usually aren't more than a few characters longer. That means the second half of the password is usually an encryption of a one, two, or three-character long string. It's really easy to crack passwords that short, so you could probably crack most of the second halves of the passwords in a few minutes. Then you just half to crack the first halves as usual. Plus those few letters you got from the second half might help you guess the first half (e.g., if the second half the password of the user "satanboy" is "66" then you might guess that the first half is a seven letter word followed by a "6" - a program could try all those in a minute or two).
This isn't really that big a problem for Grex. After all, since the encrypted passwords are in a file readable only to root, we don't expect many people to ever see them. People can still try to guess passwords without having access to the encrypted strings. All they have to do is try their guesses at Grex's login prompt. But then they can't attack long passwords half at a time (so, in fact, passwords longer than eight characters are a lot more secure), and they can't run huge numbers of guesses very fast because, well, they'd be on Grex and Grex isn't fast.
But there was another reason we wanted to change the password encryption algorithm. Currently Grex is all one computer, but in the future we hope to use different machines for different services, for example, there might be a separate machine to handle E-mail. In such an environment, we'd want users' accounts to work on multiple machines. That means we'd need to use some form of distributed password system, like Kerberos. In such system, encrypted passwords can't stay stored safely away in a protected file on one machine. They have to be passed among machines, and as soon as they start wandering the net, the risks of someone getting a copy of one becomes much larger. We will need a more secure encryption method.
So Marcus Watts formulated the following requirements for a new encryption algorithm:
- Long passwords should be allowed, and all their characters (up to at least 128 characters) should be significant.
- All bits of the encrypted password should depend on all bits of the cleartext input.
- The salt should be based on some combination of the user's loginid and hostname, rather than being randomly generated as in DES password encryption. This is a signficant advantage in a Kerberos-like environment.
- We should not use an algorithm which, for legal reasons, cannot be exported from the USA. Thus DES is not acceptable, but md5 or sha-1 is.
- The algorithm should return a binary number that is then convertible into a fixed length ascii string suitable for inclusion in a Unix password file. The number should have as many bits as possible, while the string should be as short as possible.
- The code should be fast. Many encryption algorithms are deliberately slow, to slow down people who attack the password by trying every word in the dictionary. But Grex is already so slow that we can't afford a password encryption algorithm that further slows down the machine.
- It should be difficult or impossible to derive the cleartext password from the encrypted password using any method short of a brute force dictionary attack.
To replace the DES encryption, Marcus Watts has developed a new encryption algorithm based on the SHA (Secure Encryption Algorithm). The one tricky part in this is that password lengths vary, and SHA requires wants a constant length input. So the basic algorithm is:
- Concatinate together the following:
- 128 minus <password length> pad characters (all backslashes)
- the clear text password (truncated to 128 characters if it is longer)
- the user's name long
- the instance (currently always a null string)
- the realm (the name of the local host)
- Apply the SHS hashing function to the resulting string.
- Concatinate together the following:
- the clear text password (again)
- 128 minus <password length> pad characters (all less-than signs)
- the result of the previous SHS hash.
- Apply the SHS hashing function to the resulting string to get a 20-byte binary string.
- Convert the binary string into a string of 26 printable characters.
- Prepend the characters %! onto the string to make a 28 character string. This marks it as a result of this algorithm.
The reason for tagging the passwords generated by this algorithm with a %! sequence is that, for a while at least, the password file will contain a mixture of DES passwords and new passwords, and we need to be able to distinguish them. The reason for this is to smooth the transition from the old encryption to the new encryption. We can't easily convert existing passwords from the old scheme to the new scheme. You'd have to decrypt the old passwords before you could re-encrypt them with the new scheme. It's doubtful whether such a project could be considered practical or ethical.
So we are making the change gradually. For people with DES passwords, the lines in the /etc/shadow file look like the example above. For people with new passwords, the encryption string always starts with the three characters %!, like this:
janc:%!!@Tpf4)X^Pes@Xms8tq(M).U+#:10008:0:365:30:::
If the login program doesn't see the %!! in your encrypted password, it uses the DES encryption algorithm, if it does, it uses the new algorithm. All new accounts are now being created using the new encryption, and whenever an older user makes a password change, the password encryption is changed over to the new algorithm. So within a year (since passwords expire after a year), all Grex users will be using the new algorithm, and we will be ready to change over to a distributed password system.
Marcus's actual source for the kg_pwhash() routine that does the encryption is available on a separate page.