最近看到了好几篇关于string.intern()的技术博客,
这两篇文章介绍了一些intern的内幕,以及使用上应该避免的坑。但是没有更过的介绍intern的实现细节,于是自己去看了一下jvm的具体实现,在这里做一个笔记记录一下,方便以后翻阅。
以下的源码基于openjdk-8-src-b132-03
string的intern()入口在jdk/src/share/native/java/lang/String.c,
#include "jvm.h"
#include "java_lang_String.h"
JNIEXPORT jobject JNICALL
Java_java_lang_String_intern(JNIEnv *env, jobject this)
{
return JVM_InternString(env, this);
}
进入到JVM_InternString中,
JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str))
JVMWrapper("JVM_InternString");
JvmtiVMObjectAllocEventCollector oam;
if (str == NULL) return NULL;
oop string = JNIHandles::resolve_non_null(str);
oop result = StringTable::intern(string, CHECK_NULL);
return (jstring) JNIHandles::make_local(env, result);
JVM_END
有些人可能会疑惑JVM_ENTRY是啥意思,其实没有什么太神秘的,由于jvm内部很多函数调用的最开始都要进行相同的操作,所以jvm将这些相同的操作抽象成了一组宏。
#define JVM_ENTRY(result_type, header) \
extern "C" { \
result_type JNICALL header { \
JavaThread* thread=JavaThread::thread_from_jni_environment(env); \
ThreadInVMfromNative __tiv(thread); \
debug_only(VMNativeEntryWrapper __vew;) \
VM_ENTRY_BASE(result_type, header, thread)
代码的主要处理调用的使StringTable::intern(),进入到这个函数看看。
oop StringTable::intern(Handle string_or_null, jchar* name,
int len, TRAPS) {
unsigned int hashValue = hash_string(name, len);
int index = the_table()->hash_to_index(hashValue);
oop found_string = the_table()->lookup(index, name, len, hashValue);
// Found
if (found_string != NULL) return found_string;
debug_only(StableMemoryChecker smc(name, len * sizeof(name[0])));
assert(!Universe::heap()->is_in_reserved(name),
"proposed name of symbol must be stable");
Handle string;
// try to reuse the string if possible
if (!string_or_null.is_null()) {
string = string_or_null;
} else {
string = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}
// Grab the StringTable_lock before getting the_table() because it could
// change at safepoint.
MutexLocker ml(StringTable_lock, THREAD);
// Otherwise, add to symbol to table
return the_table()->basic_add(index, string, name, len,
hashValue, CHECK_NULL);
}
主要分为以下几步
- 根据查询的字符串以及长度计算hash值,
unsigned int StringTable::hash_string(const jchar* s, int len) {
return use_alternate_hashcode() ? AltHashing::murmur3_32(seed(), s, len) :
java_lang_String::hash_code(s, len);
}
- 根据hash值以及table的size,计算index值
int hash_to_index(unsigned int full_hash) {
int h = full_hash % _table_size;
assert(h >= 0 && h < _table_size, "Illegal hash value");
return h; }
- 字符串查找
oop StringTable::lookup(int index, jchar* name,
int len, unsigned int hash) {
int count = 0;
for (HashtableEntry<oop, mtSymbol>* l = bucket(index); l != NULL; l = l->next()) {
count++;
// 找到的条件
if (l->hash() == hash) {
if (java_lang_String::equals(l->literal(), name, len)) {
return l->literal();
}
}
}
// 是否需要rebash
// If the bucket size is too deep check if this hash code is insufficient.
if (count >= BasicHashtable<mtSymbol>::rehash_count && !needs_rehashing()) {
_needs_rehashing = check_rehash_table(count);
}
return NULL;
}
首先根据index值取出对应的bucket,然后依次遍历bucket中的每个值,如果相同则返回。如果找不到则判断是否需要进行rehash操作。
- 如果找到了,则直接返回
if (found_string != NULL) return found_string;
-
如果没有则获取锁
-
新建string,并且更新StringTable
oop StringTable::basic_add(int index_arg, Handle string, jchar* name,
int len, unsigned int hashValue_arg, TRAPS) {
assert(java_lang_String::equals(string(), name, len),
"string must be properly initialized");
// Cannot hit a safepoint in this function because the "this" pointer can move.
No_Safepoint_Verifier nsv;
// Check if the symbol table has been rehashed, if so, need to recalculate
// the hash value and index before second lookup.
unsigned int hashValue;
int index;
if (use_alternate_hashcode()) {
hashValue = hash_string(name, len);
index = hash_to_index(hashValue);
} else {
hashValue = hashValue_arg;
index = index_arg;
}
// Since look-up was done lock-free, we need to check if another
// thread beat us in the race to insert the symbol.
oop test = lookup(index, name, len, hashValue); // calls lookup(u1*, int)
if (test != NULL) {
// Entry already added
return test;
}
HashtableEntry<oop, mtSymbol>* entry = new_entry(hashValue, string());
add_entry(index, entry);
return string();
}
在插入之前需要判断是否相同的值已经插入了stringtable,因为最开始的looup调用是lock-free的。如果已经有其他线程抢先更新了,则直接返回。否则进行添加。