Login

To quote from [Guidelines and rules for GetHashCode][1] by Eric Lippert:

> Rule: **Consumers of GetHashCode cannot rely upon it being stable over time or across appdomains**
>
> Suppose you have a Customer object
> that has a bunch of fields like Name,
> Address, and so on. If you make two
> such objects with exactly the same
> data in two different processes, they
> do not have to return the same hash
> code. If you make such an object on
> Tuesday in one process, shut it down,
> and run the program again on
> Wednesday, the hash codes can be
> different.
>
> This has bitten people in the past.
> The documentation for
> System.String.GetHashCode notes
> specifically that two identical
> strings can have different hash codes
> in different versions of the CLR, and
> in fact they do. **Don't store string hashes in databases and expect them to be the same forever, because they won't be.**

So what is the correct way to create a HashCode of a string that I can store in a database?

(Please tell me I am not the first person to have left this bug in software I have written!)

[1]:

[To see links please register here]

It depends what properties you want that hash to have. For example, you *could* just write something like this:

public int HashString(string text)
{
// TODO: Determine nullity policy.

unchecked
{
int hash = 23;
foreach (char c in text)
{
hash = hash * 31 + c;
}
return hash;
}
}

So long as you *document* that that is how the hash is computed, that's valid. It's in no way cryptographically secure or anything like that, but you can persist it with no problems. Two strings which are absolutely equal in the ordinal sense (i.e. with no cultural equality etc applied, exactly character-by-character the same) will produce the same hash with this code.

The problems come when you rely on *undocumented* hashing - i.e. something which obeys `GetHashCode()` but is in no way guaranteed to remain the same from version to version... like `string.GetHashCode()`.

Writing and documenting your own hash like this is a bit like saying, "This sensitive information is hashed with MD5 (or whatever)". So long as it's a well-defined hash, that's fine.

EDIT: Other answers have suggested using cryptographic hashes such as SHA-1 or MD5. I would say that until we know there's a requirement for cryptographic security rather than just stability, there's no point in going through the rigmarole of converting the string to a byte array and hashing that. Of course if the hash *is* meant to be used for anything security-related, an industry-standard hash is *exactly* what you should be reaching for. But that wasn't mentioned anywhere in the question.

Here is a reimplementation of [the current way .NET calculates it's string hash code for 64 bit systems][1]. This does not use pointers like the real `GetHashCode()` does so it will be slightly slower, but it does make it more resilient to internal changes to `string`, this will give a more evenly distributed hash code than [Jon Skeet's version][2] which may result in better lookup times in dictionaries.

public static class StringExtensionMethods
{
public static int GetStableHashCode(this string str)
{
unchecked
{
int hash1 = 5381;
int hash2 = hash1;

for(int i = 0; i < str.Length && str[i] != '\0'; i += 2)
{
hash1 = ((hash1 << 5) + hash1) ^ str[i];
if (i == str.Length - 1 || str[i+1] == '\0')
break;
hash2 = ((hash2 << 5) + hash2) ^ str[i+1];
}

return hash1 + (hash2*1566083941);
}
}
}

[1]:

[To see links please register here]

[2]:

[To see links please register here]

There is now the [System.IO.Hashing][1] package that provides stable and standardized non-cryptographic hash algorithms. While they are designed for byte sequences, it is fairly straightforward to use them safely and very efficiently through `Span`:

```cs
var input = "Hello world";
var inputBytes = MemoryMarshal.AsBytes(input.AsSpan());
var hash = System.IO.Hashing.XxHash32.HashToUInt32(inputBytes);
Console.WriteLine(hash); // 899079058
```

Note however that, due to the reinterpretation of characters as bytes, the endianness of the system affects the result, so if you move to a big-endian system, the hash above will be different. If that is an issue, you can check `BitConverter.IsLittleEndian` and swap the bytes if it's `false`.

[1]:

[To see links please register here]

lisettevgmxmxk

ellagic431061

diarrheal200107

distaff449363