背景

java是一门面向对象的语言,与过程式语言不同的是,我们要进行的操作往往都和某个对象相关联。我们知道我们使用对象的时候,需要通过new操作符初始化一个对象,但是new操作符具体都做了什么呢?相信这个问题很多人虽然能简要回答一些,比如new会在堆内存区域开辟一个空间然后将对象的数据放入这个区域,如果再继续深入追问,很多人都会比较迷茫。最近正好在看jvm的源码,借此机会了解一下对象分配的完整过程,和大家分享一下。

为了更好的阐述对象的分配流程,我们需要一个简单的例子作为入口。下面的代码很简单,只有一行有效代码: 初始化一个类型为Test的对象。

public class Test {
    public String str = "abcdefg";
    public static void main(String[] args) {
        Test t = new Test();
    }
}

new操作

我们知道java是一门解释型语言(不考虑编译优化),所有的操作最终都会通过某一条字节码指令完成,new操作也不例外。new操作在jvm的规范中对应的字节码也是new,我们看一下jvm是如何实现这个指令的(下面的内容均基于openjdk8)。

//------------------------------------------------------------------------------------------------------------------------
// Allocation

IRT_ENTRY(void, InterpreterRuntime::_new(JavaThread* thread, ConstantPool* pool, int index))
  Klass* k_oop = pool->klass_at(index, CHECK);
  instanceKlassHandle klass (THREAD, k_oop);

  // Make sure we are not instantiating an abstract klass
  klass->check_valid_for_instantiation(true, CHECK);

  // Make sure klass is initialized
  klass->initialize(CHECK);

  // At this point the class may not be fully initialized
  // because of recursive initialization. If it is fully
  // initialized & has_finalized is not set, we rewrite
  // it into its fast version (Note: no locking is needed
  // here since this is an atomic byte write and can be
  // done more than once).
  //
  // Note: In case of classes with has_finalized we don't
  //       rewrite since that saves us an extra check in
  //       the fast version which then would call the
  //       slow version anyway (and do a call back into
  //       Java).
  //       If we have a breakpoint, then we don't rewrite
  //       because the _breakpoint bytecode would be lost.
  oop obj = klass->allocate_instance(CHECK);
  thread->set_vm_result(obj);
IRT_END

上面的_new方法就是字节码new的入口,这段代码主要做了两件事情,

  1. 在常量池中查找需要初始化对象对应的类信息
  2. 根据查找到的类信息,初始化对象 而初始化的操作则封装在allocate_instance中, ``` instanceOop InstanceKlass::allocate_instance(TRAPS) { bool has_finalizer_flag = has_finalizer(); // Query before possible GC int size = size_helper(); // Query before forming handle.

KlassHandle h_k(THREAD, this);

instanceOop i;

i = (instanceOop)CollectedHeap::obj_allocate(h_k, size, CHECK_NULL); if (has_finalizer_flag && !RegisterFinalizersAtInit) { i = register_finalizer(i, CHECK_NULL); } return i;

上面的代码主要干了两件事情,
1. 调用堆的obj_allocate方法分配对象
2. 如果类实现了finalize方法,则调用register_finalizer注册finalizer。

obj_allocate会调用common_mem_allocate_init分配对象,

HeapWord* CollectedHeap::common_mem_allocate_init(KlassHandle klass, size_t size, TRAPS) { HeapWord* obj = common_mem_allocate_noinit(klass, size, CHECK_NULL); init_obj(obj, size); return obj; }

对象的分配分为两步,
1. 分配空间
2. 对象的初始化

## 分配空间

我们先来看一下分配空间,

HeapWord* CollectedHeap::common_mem_allocate_noinit(KlassHandle klass, size_t size, TRAPS) {

// Clear unhandled oops for memory allocation. Memory allocation might // not take out a lock if from tlab, so clear here. CHECK_UNHANDLED_OOPS_ONLY(THREAD->clear_unhandled_oops();)

HeapWord* result = NULL; if (UseTLAB) { result = allocate_from_tlab(klass, THREAD, size); if (result != NULL) { assert(!HAS_PENDING_EXCEPTION, “Unexpected exception, will result in uninitialized storage”); return result; } } bool gc_overhead_limit_was_exceeded = false; result = Universe::heap()->mem_allocate(size, &gc_overhead_limit_was_exceeded); if (result != NULL) { NOT_PRODUCT(Universe::heap()-> check_for_non_bad_heap_word_value(result, size)); assert(!HAS_PENDING_EXCEPTION, “Unexpected exception, will result in uninitialized storage”); THREAD->incr_allocated_bytes(size * HeapWordSize);

AllocTracer::send_allocation_outside_tlab_event(klass, size * HeapWordSize);

return result;   }

if (!gc_overhead_limit_was_exceeded) { // -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support report_java_out_of_memory(“Java heap space”);

if (JvmtiExport::should_post_resource_exhausted()) {
  JvmtiExport::post_resource_exhausted(
    JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP,
    "Java heap space");
}

THROW_OOP_0(Universe::out_of_memory_error_java_heap());   } else {
// -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support
report_java_out_of_memory("GC overhead limit exceeded");

if (JvmtiExport::should_post_resource_exhausted()) {
  JvmtiExport::post_resource_exhausted(
    JVMTI_RESOURCE_EXHAUSTED_OOM_ERROR | JVMTI_RESOURCE_EXHAUSTED_JAVA_HEAP,
    "GC overhead limit exceeded");
}   } } ``` 对象的分配有两个路径,一个是通过TLAB分配,一个是通过常规的Heap分配。

TLAB分配

TLAB是线程私有的一块区域,如果JVM启动的时候开启了UseTLAB参数,那么JVM就会为当前区域生成一块私有区域。每个TLAB只有当前线程可以访问,所以在生成对象的时候没有锁,所以生成对象的效率要高于普通的堆分配。TLAB的初始化是通过initialize_tlab来进行的,具体的初始化过程暂时不展开了。只看一下对应的分配过程,

inline HeapWord* ThreadLocalAllocBuffer::allocate(size_t size) {
  invariants();
  HeapWord* obj = top();
  if (pointer_delta(end(), obj) >= size) {
    // successful thread-local allocation
#ifdef ASSERT
    // Skip mangling the space corresponding to the object header to
    // ensure that the returned space is not considered parsable by
    // any concurrent GC thread.
    size_t hdr_size = oopDesc::header_size();
    Copy::fill_to_words(obj + hdr_size, size - hdr_size, badHeapWordVal);
#endif // ASSERT
    // This addition is safe because we know that top is
    // at least size below end, so the add can't wrap.
    set_top(obj + size);

    invariants();
    return obj;
  }
  return NULL;
}
  1. 首先检查当前TLAB是否还有空间分配对象,如果没有直接返回NULL
  2. 如果有空间,则将top()作为返回值,同时修改top对应的值为top+size。相当于将top指针做相应的移动。

Heap分配

HeapWord* GenCollectedHeap::mem_allocate(size_t size,
                                         bool* gc_overhead_limit_was_exceeded) {
  return collector_policy()->mem_allocate_work(size,
                                               false /* is_tlab */,
                                               gc_overhead_limit_was_exceeded);
}

HeapWord* GenCollectorPolicy::mem_allocate_work(size_t size,
                                        bool is_tlab,
                                        bool* gc_overhead_limit_was_exceeded) {
  GenCollectedHeap *gch = GenCollectedHeap::heap();

  // In general gc_overhead_limit_was_exceeded should be false so
  // set it so here and reset it to true only if the gc time
  // limit is being exceeded as checked below.
  *gc_overhead_limit_was_exceeded = false;

  HeapWord* result = NULL;

  // Loop until the allocation is satisified,
  // or unsatisfied after GC.
  for (int try_count = 1, gclocker_stalled_count = 0; /* return or throw */; try_count += 1) {
    HandleMark hm; // discard any handles allocated in each iteration

    // First allocation attempt is lock-free.
    Generation *gen0 = gch->get_gen(0);
    assert(gen0->supports_inline_contig_alloc(),
      "Otherwise, must do alloc within heap lock");
    if (gen0->should_allocate(size, is_tlab)) {
      result = gen0->par_allocate(size, is_tlab);
      if (result != NULL) {
        assert(gch->is_in_reserved(result), "result not in heap");
        return result;
      }
    }
    unsigned int gc_count_before;  // read inside the Heap_lock locked region
    {
      MutexLocker ml(Heap_lock);
      if (PrintGC && Verbose) {
        gclog_or_tty->print_cr("TwoGenerationCollectorPolicy::mem_allocate_work:"
                      " attempting locked slow path allocation");
      }
      // Note that only large objects get a shot at being
      // allocated in later generations.
      bool first_only = ! should_try_older_generation_allocation(size);

      result = gch->attempt_allocation(size, is_tlab, first_only);
      if (result != NULL) {
        assert(gch->is_in_reserved(result), "result not in heap");
        return result;
      }

      if (GC_locker::is_active_and_needs_gc()) {
        if (is_tlab) {
          return NULL;  // Caller will retry allocating individual object
        }
        if (!gch->is_maximal_no_gc()) {
          // Try and expand heap to satisfy request
          result = expand_heap_and_allocate(size, is_tlab);
          // result could be null if we are out of space
          if (result != NULL) {
            return result;
          }
        }

        if (gclocker_stalled_count > GCLockerRetryAllocationCount) {
          return NULL; // we didn't get to do a GC and we didn't get any memory
        }

        // If this thread is not in a jni critical section, we stall
        // the requestor until the critical section has cleared and
        // GC allowed. When the critical section clears, a GC is
        // initiated by the last thread exiting the critical section; so
        // we retry the allocation sequence from the beginning of the loop,
        // rather than causing more, now probably unnecessary, GC attempts.
        JavaThread* jthr = JavaThread::current();
        if (!jthr->in_critical()) {
          MutexUnlocker mul(Heap_lock);
          // Wait for JNI critical section to be exited
          GC_locker::stall_until_clear();
          gclocker_stalled_count += 1;
          continue;
        } else {
          if (CheckJNICalls) {
            fatal("Possible deadlock due to allocating while"
                  " in jni critical section");
          }
          return NULL;
        }
      }

      // Read the gc count while the heap lock is held.
      gc_count_before = Universe::heap()->total_collections();
    }

    VM_GenCollectForAllocation op(size, is_tlab, gc_count_before);
    VMThread::execute(&op);
    if (op.prologue_succeeded()) {
      result = op.result();
      if (op.gc_locked()) {
         assert(result == NULL, "must be NULL if gc_locked() is true");
         continue;  // retry and/or stall as necessary
      }

      // Allocation has failed and a collection
      // has been done.  If the gc time limit was exceeded the
      // this time, return NULL so that an out-of-memory
      // will be thrown.  Clear gc_overhead_limit_exceeded
      // so that the overhead exceeded does not persist.

      const bool limit_exceeded = size_policy()->gc_overhead_limit_exceeded();
      const bool softrefs_clear = all_soft_refs_clear();

      if (limit_exceeded && softrefs_clear) {
        *gc_overhead_limit_was_exceeded = true;
        size_policy()->set_gc_overhead_limit_exceeded(false);
        if (op.result() != NULL) {
          CollectedHeap::fill_with_object(op.result(), size);
        }
        return NULL;
      }
      assert(result == NULL || gch->is_in_reserved(result),
             "result not in heap");
      return result;
    }

    // Give a warning if we seem to be looping forever.
    if ((QueuedAllocationWarningCount > 0) &&
        (try_count % QueuedAllocationWarningCount == 0)) {
          warning("TwoGenerationCollectorPolicy::mem_allocate_work retries %d times \n\t"
                  " size=%d %s", try_count, size, is_tlab ? "(TLAB)" : "");
    }
  }
}

基于堆的分配流程比较复杂,不过大致可以分为以下的几个阶段

  1. 首先尝试进行lock free(无锁)的分配
  2. 如果无锁分配失败了,则加锁进行分配
  3. 如果加锁分配失败了,则尝试进行堆扩充,继续分配
  4. 如果仍然失败,则尝试执行gc,继续分配
  5. 否则进入下一次循环继续重试上面的流程。

lock free分配

首先看一下进入该阶段的条件,

  virtual bool should_allocate(size_t word_size, bool is_tlab) {
    assert(UseTLAB || !is_tlab, "Should not allocate tlab");

    size_t overflow_limit    = (size_t)1 << (BitsPerSize_t - LogHeapWordSize);

    const bool non_zero      = word_size > 0;
    const bool overflows     = word_size >= overflow_limit;
    const bool check_too_big = _pretenure_size_threshold_words > 0;
    const bool not_too_big   = word_size < _pretenure_size_threshold_words;
    const bool size_ok       = is_tlab || !check_too_big || not_too_big;

    bool result = !overflows &&
                  non_zero   &&
                  size_ok;

    return result;
  }

上述条件中有一条比较引人关注:size_ok。当需要分配的对象的大小超过某个阈值的时候,则不会进入该阶段。因为该阶段作用于新生代,也就是过大的对象会跳过新生代分配,进入到老年代分配。

// This version is lock-free.
inline HeapWord* ContiguousSpace::par_allocate_impl(size_t size,
                                                    HeapWord* const end_value) {
  do {
    HeapWord* obj = top();
    if (pointer_delta(end_value, obj) >= size) {
      HeapWord* new_top = obj + size;
      HeapWord* result = (HeapWord*)Atomic::cmpxchg_ptr(new_top, top_addr(), obj);
      // result can be one of two:
      //  the old top value: the exchange succeeded
      //  otherwise: the new value of the top is returned.
      if (result == obj) {
        assert(is_aligned(obj) && is_aligned(new_top), "checking alignment");
        return obj;
      }
    } else {
      return NULL;
    }
  } while (true);

lock free阶段的代码和TLAB分配的很类似,不过由于可能是多线程的,所以在过程中使用了cas来保证并发的正确,如果分配失败则返回NULL。

加锁分配

该阶段开始的时候会针对Heap_lock加锁,这个锁是互斥的。

HeapWord* GenCollectedHeap::attempt_allocation(size_t size,
                                               bool is_tlab,
                                               bool first_only) {
  HeapWord* res;
  for (int i = 0; i < _n_gens; i++) {
    if (_gens[i]->should_allocate(size, is_tlab)) {
      res = _gens[i]->allocate(size, is_tlab);
      if (res != NULL) return res;
      else if (first_only) break;
    }
  }
  // Otherwise...
  return NULL;
}

通过上面的代码可以发现该阶段会依次尝试在分代堆的各代中尝试分配对象,只要成功了则返回。如果设置了first_only,也就是只尝试使用新生代,那么当新生代分配失败的时候就会终止不会再尝试老年代。那么什么时候first_only为true呢?

bool first_only = ! should_try_older_generation_allocation(size);

bool GenCollectorPolicy::should_try_older_generation_allocation(
        size_t word_size) const {
  GenCollectedHeap* gch = GenCollectedHeap::heap();
  size_t gen0_capacity = gch->get_gen(0)->capacity_before_gc();
  return    (word_size > heap_word_size(gen0_capacity))
         || GC_locker::is_active_and_needs_gc()
         || gch->incremental_collection_failed();

有三个条件会让should_try_older_generation_allocation返回true,也就是只有这三种条件下first_only不为true,会尝试下一代分配。

我们看看新生代的allocate是如何实现的?

HeapWord* DefNewGeneration::allocate(size_t word_size,
                                     bool is_tlab) {
  // This is the slow-path allocation for the DefNewGeneration.
  // Most allocations are fast-path in compiled code.
  // We try to allocate from the eden.  If that works, we are happy.
  // Note that since DefNewGeneration supports lock-free allocation, we
  // have to use it here, as well.
  HeapWord* result = eden()->par_allocate(word_size);
  if (result != NULL) {
    if (CMSEdenChunksRecordAlways && _next_gen != NULL) {
      _next_gen->sample_eden_chunk();
    }
    return result;
  }
  do {
    HeapWord* old_limit = eden()->soft_end();
    if (old_limit < eden()->end()) {
      // Tell the next generation we reached a limit.
      HeapWord* new_limit =
        next_gen()->allocation_limit_reached(eden(), eden()->top(), word_size);
      if (new_limit != NULL) {
        Atomic::cmpxchg_ptr(new_limit, eden()->soft_end_addr(), old_limit);
      } else {
        assert(eden()->soft_end() == eden()->end(),
               "invalid state after allocation_limit_reached returned null");
      }
    } else {
      // The allocation failed and the soft limit is equal to the hard limit,
      // there are no reasons to do an attempt to allocate
      assert(old_limit == eden()->end(), "sanity check");
      break;
    }
    // Try to allocate until succeeded or the soft limit can't be adjusted
    result = eden()->par_allocate(word_size);
  } while (result == NULL);

  // If the eden is full and the last collection bailed out, we are running
  // out of heap space, and we try to allocate the from-space, too.
  // allocate_from_space can't be inlined because that would introduce a
  // circular dependency at compile time.
  if (result == NULL) {
    result = allocate_from_space(word_size);
  } else if (CMSEdenChunksRecordAlways && _next_gen != NULL) {
    _next_gen->sample_eden_chunk();
  }
  return result;
}

新生代的allocate会做如下的操作,

  1. 调用par_allocate做lock free分配
  2. 如果eden无法满足分配要求,尝试从from分配。

堆增长分配

尝试调用各代的expand_and_allocate,扩展堆的同时进行分配。新生代无法扩容,直接调用allocate进行分配。老年代则依赖各个收集算法的实现,这里以cms为例,

HeapWord*
ConcurrentMarkSweepGeneration::expand_and_allocate(size_t word_size,
                                                   bool   tlab,
                                                   bool   parallel) {
  CMSSynchronousYieldRequest yr;
  assert(!tlab, "Can't deal with TLAB allocation");
  MutexLockerEx x(freelistLock(), Mutex::_no_safepoint_check_flag);
  expand(word_size*HeapWordSize, MinHeapDeltaBytes,
    CMSExpansionCause::_satisfy_allocation);
  if (GCExpandToAllocateDelayMillis > 0) {
    os::sleep(Thread::current(), GCExpandToAllocateDelayMillis, false);
  }
  return have_lock_and_allocate(word_size, tlab);
}

首先调用expand进行扩容,然后分配对象。expand这里不做展开了。

垃圾回收分配

这里会触发垃圾回收机制,这里也不做展开了。

对象初始化

经过上面的步骤对象所需要的内存就分配好了,下面需要开始初始化对象了。

void CollectedHeap::init_obj(HeapWord* obj, size_t size) {
  assert(obj != NULL, "cannot initialize NULL object");
  const size_t hs = oopDesc::header_size();
  assert(size >= hs, "unexpected object size");
  ((oop)obj)->set_klass_gap(0);
  Copy::fill_to_aligned_words(obj + hs, size - hs);
}

这个步骤会设置对象头,

void CollectedHeap::post_allocation_setup_no_klass_install(KlassHandle klass,
                                                           HeapWord* objPtr) {
  oop obj = (oop)objPtr;

  assert(obj != NULL, "NULL object pointer");
  if (UseBiasedLocking && (klass() != NULL)) {
    obj->set_mark(klass->prototype_header());
  } else {
    // May be bootstrapping
    obj->set_mark(markOopDesc::prototype());
  }
}

构造函数调用

当通过new分配了对象后,还需要调用构造函数进行实际的初始化。

4: invokespecial #5                  // Method "<init>":()V

jvm会通过类似上面的指令执行构造函数。拿之前的例子为例,

  public Test();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: aload_0
         5: ldc           #2                  // String abcdefg
         7: putfield      #3                  // Field str:Ljava/lang/String;
        10: return
      LineNumberTable:
        line 1: 0
        line 2: 4
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      11     0  this   LTest;

我们可以看到构造函数对str属性进行了赋值,对应的值是abcdefg。