Resolution of ART Object Memory Allocation Process - Preparing Stage for Memory Allocation (Android 8.1)

Keywords: Android Java

Note: This paper is based on Android 8.1 for analysis.

Resolution of ART Object Allocation Process - Preparing Stage for Memory Allocation

In this chapter, we will analyze the memory allocation process during object creation of the ART virtual machine in Android 8.1.This section describes environment preparation related to memory allocation and various jump logic.

Let's start with the Thread class.

Thread class

The Init() method of the Thread class does all thread-related initialization work, such as initializing Cpu information, and the member function InitTlsEntryPoints initializes an external library function call jump table.For example, the Thread class divides external library function call jump tables into four categories, where interpreter_entrypoints_describes the jump tables to be used by the interpreter, jni_entrypoints_describes the jump tables associated with JNI calls, and portable_entrypoints_describes the jump tables to be used by local machine instructions generated by the Portable backend.quick_entrypoints_describes the jump tables used by local machine instructions generated by the Quick backend.These function jump entries are entered by accessing the offset corresponding to the thread Thread.

Thread's Init method:

bool Thread::Init(ThreadList* thread_list, JavaVMExt* java_vm, JNIEnvExt* jni_env_ext) {
  // This function does all the initialization that must be run by the native thread it applies to.
  // (When we create a new thread from managed code, we allocate the Thread* in Thread::Create so
  // we can handshake with the corresponding native thread when it's ready.) Check this native
  // thread hasn't been through here already...
  CHECK(Thread::Current() == nullptr);

  // Set pthread_self_ ahead of pthread_setspecific, that makes Thread::Current function, this
  // avoids pthread_self_ ever being invalid when discovered from Thread::Current().
  tlsPtr_.pthread_self = pthread_self();
  CHECK(is_started_);

  SetUpAlternateSignalStack();
  if (!InitStackHwm()) {
    return false;
  }
  InitCpu();
  InitTlsEntryPoints();
  RemoveSuspendTrigger();
  InitCardTable();
  InitTid();
  interpreter::InitInterpreterTls(this);
  ......
  thread_list->Register(this);
  return true;
}

Thread's InitTlsEntryPoints() method:

void Thread::InitTlsEntryPoints() {
  // Insert a placeholder so we can easily tell if we call an unimplemented entry point.
  uintptr_t* begin = reinterpret_cast<uintptr_t*>(&tlsPtr_.jni_entrypoints);
  uintptr_t* end = reinterpret_cast<uintptr_t*>(
      reinterpret_cast<uint8_t*>(&tlsPtr_.quick_entrypoints) + sizeof(tlsPtr_.quick_entrypoints));
  for (uintptr_t* it = begin; it != end; ++it) {
    *it = reinterpret_cast<uintptr_t>(UnimplementedEntryPoint);
  }
  InitEntryPoints(&tlsPtr_.jni_entrypoints, &tlsPtr_.quick_entrypoints);
}

entrypoints directory

Thread's InitTlsEntryPoints() method calls the InitEntryPoints() method and passes in the offset address.This method is implemented differently depending on the cpu architecture of the device. Let's look at the implementation of ARM 64 (/art/runtime/arch/arm64/entrypoints_init_arm64.cc):

void InitEntryPoints(JniEntryPoints* jpoints, QuickEntryPoints* qpoints) {
     DefaultInitEntryPoints(jpoints, qpoints);
     ......
 }

Call the DefaultInitEntryPoints() method (/art/runtime/entrypoints/quick/quick_default_init_entrypoints.h):

static void DefaultInitEntryPoints(JniEntryPoints* jpoints, QuickEntryPoints* qpoints) {
  // JNI
  jpoints->pDlsymLookup = art_jni_dlsym_lookup_stub;

  // Alloc
  ResetQuickAllocEntryPoints(qpoints, /* is_marking */ true);
  ......
}

We only focus on the Alloc section.The ResetQuickAllocEntryPoints() method continues to be called here.

Location: /art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc

static gc::AllocatorType entry_points_allocator = gc::kAllocatorTypeDlMalloc;

void SetQuickAllocEntryPointsAllocator(gc::AllocatorType allocator) {
  entry_points_allocator = allocator;
}
void ResetQuickAllocEntryPoints(QuickEntryPoints* qpoints, bool is_marking) {
#if !defined(__APPLE__) || !defined(__LP64__)
  switch (entry_points_allocator) {
    case gc::kAllocatorTypeDlMalloc: {
      SetQuickAllocEntryPoints_dlmalloc(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeRosAlloc: {
      SetQuickAllocEntryPoints_rosalloc(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeBumpPointer: {
      CHECK(kMovingCollector);
      SetQuickAllocEntryPoints_bump_pointer(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeTLAB: {
      CHECK(kMovingCollector);
      SetQuickAllocEntryPoints_tlab(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeRegion: {
      CHECK(kMovingCollector);
      SetQuickAllocEntryPoints_region(qpoints, entry_points_instrumented);
      return;
    }
    case gc::kAllocatorTypeRegionTLAB: {
      CHECK(kMovingCollector);
      if (is_marking) {
        SetQuickAllocEntryPoints_region_tlab(qpoints, entry_points_instrumented);
      } else {
        // Not marking means we need no read barriers and can just use the normal TLAB case.
        SetQuickAllocEntryPoints_tlab(qpoints, entry_points_instrumented);
      }
      return;
    }
    default:
      break;
  }
#else
  UNUSED(qpoints);
  UNUSED(is_marking);
#endif
  UNIMPLEMENTED(FATAL);
  UNREACHABLE();
}
  • Entry_points_allocator represents the type of memory allocator, and an initial value of kAllocatorTypeDlMalloc indicates that the allocator entry for DlMalloc will be used.You can change the value of entry_points_allocator by calling SetQuickAllocEntryPoints Allocator.In most cases, the value entry_points_allocator is kAllocatorTypeRosAlloc.

  • SetQuickAllocEntryPointsAllocator is called when the ChangeAllocator method modifies the allocator, and ChangeAllocator is called when the ChangeCollector (modifies the garbage collection method).

The above code calls to the SetQuickAllocEntryPoints_+ different allocator suffixes, and where is this method defined?Let's go on.

/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc:

#define GENERATE_ENTRYPOINTS(suffix) \
extern "C" void* art_quick_alloc_array_resolved##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved8##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved16##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved32##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved64##suffix(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_object_resolved##suffix(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_initialized##suffix(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_with_checks##suffix(mirror::Class* klass); \
extern "C" void* art_quick_alloc_string_from_bytes##suffix(void*, int32_t, int32_t, int32_t); \
extern "C" void* art_quick_alloc_string_from_chars##suffix(int32_t, int32_t, void*); \
extern "C" void* art_quick_alloc_string_from_string##suffix(void*); \
extern "C" void* art_quick_alloc_array_resolved##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved8##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved16##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved32##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_array_resolved64##suffix##_instrumented(mirror::Class* klass, int32_t); \
extern "C" void* art_quick_alloc_object_resolved##suffix##_instrumented(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_initialized##suffix##_instrumented(mirror::Class* klass); \
extern "C" void* art_quick_alloc_object_with_checks##suffix##_instrumented(mirror::Class* klass); \
extern "C" void* art_quick_alloc_string_from_bytes##suffix##_instrumented(void*, int32_t, int32_t, int32_t); \
extern "C" void* art_quick_alloc_string_from_chars##suffix##_instrumented(int32_t, int32_t, void*); \
extern "C" void* art_quick_alloc_string_from_string##suffix##_instrumented(void*); \
void SetQuickAllocEntryPoints##suffix(QuickEntryPoints* qpoints, bool instrumented) { \
  if (instrumented) { \
    qpoints->pAllocArrayResolved = art_quick_alloc_array_resolved##suffix##_instrumented; \
    qpoints->pAllocArrayResolved8 = art_quick_alloc_array_resolved8##suffix##_instrumented; \
    qpoints->pAllocArrayResolved16 = art_quick_alloc_array_resolved16##suffix##_instrumented; \
    qpoints->pAllocArrayResolved32 = art_quick_alloc_array_resolved32##suffix##_instrumented; \
    qpoints->pAllocArrayResolved64 = art_quick_alloc_array_resolved64##suffix##_instrumented; \
    qpoints->pAllocObjectResolved = art_quick_alloc_object_resolved##suffix##_instrumented; \
    qpoints->pAllocObjectInitialized = art_quick_alloc_object_initialized##suffix##_instrumented; \
    qpoints->pAllocObjectWithChecks = art_quick_alloc_object_with_checks##suffix##_instrumented; \
    qpoints->pAllocStringFromBytes = art_quick_alloc_string_from_bytes##suffix##_instrumented; \
    qpoints->pAllocStringFromChars = art_quick_alloc_string_from_chars##suffix##_instrumented; \
    qpoints->pAllocStringFromString = art_quick_alloc_string_from_string##suffix##_instrumented; \
  } else { \
    qpoints->pAllocArrayResolved = art_quick_alloc_array_resolved##suffix; \
    qpoints->pAllocArrayResolved8 = art_quick_alloc_array_resolved8##suffix; \
    qpoints->pAllocArrayResolved16 = art_quick_alloc_array_resolved16##suffix; \
    qpoints->pAllocArrayResolved32 = art_quick_alloc_array_resolved32##suffix; \
    qpoints->pAllocArrayResolved64 = art_quick_alloc_array_resolved64##suffix; \
    qpoints->pAllocObjectResolved = art_quick_alloc_object_resolved##suffix; \
    qpoints->pAllocObjectInitialized = art_quick_alloc_object_initialized##suffix; \
    qpoints->pAllocObjectWithChecks = art_quick_alloc_object_with_checks##suffix; \
    qpoints->pAllocStringFromBytes = art_quick_alloc_string_from_bytes##suffix; \
    qpoints->pAllocStringFromChars = art_quick_alloc_string_from_chars##suffix; \
    qpoints->pAllocStringFromString = art_quick_alloc_string_from_string##suffix; \
  } \
}

Let's take pAllocObject for example, in fact art_quick_alloc_object_rosalloc uses the bl directive to jump to the C function artAllocObjectFromCodeRosAlloc.The parameter type_idx describes the type of object to be allocated, passed through register r0, and the parameter method describes the class method currently invoked, passed through register r1.

Take the function artAllocObjectFromCodeRosAlloc for example, which is called by the following code: (/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc)

#define GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, suffix2, instrumented_bool, allocator_type) \
extern "C" mirror::Object* artAllocObjectFromCodeWithChecks##suffix##suffix2( \
    mirror::Class* klass, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  return artAllocObjectFromCode<false, true, instrumented_bool, allocator_type>(klass, self); \
} \
extern "C" mirror::Object* artAllocObjectFromCodeResolved##suffix##suffix2( \
    mirror::Class* klass, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  return artAllocObjectFromCode<false, false, instrumented_bool, allocator_type>(klass, self); \
} \
extern "C" mirror::Object* artAllocObjectFromCodeInitialized##suffix##suffix2( \
    mirror::Class* klass, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  return artAllocObjectFromCode<true, false, instrumented_bool, allocator_type>(klass, self); \
} \
extern "C" mirror::Array* artAllocArrayFromCodeResolved##suffix##suffix2( \
    mirror::Class* klass, int32_t component_count, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  ScopedQuickEntrypointChecks sqec(self); \
  return AllocArrayFromCodeResolved<instrumented_bool>(klass, component_count, self, \
                                                       allocator_type); \
} \
extern "C" mirror::String* artAllocStringFromBytesFromCode##suffix##suffix2( \
    mirror::ByteArray* byte_array, int32_t high, int32_t offset, int32_t byte_count, \
    Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  ScopedQuickEntrypointChecks sqec(self); \
  StackHandleScope<1> hs(self); \
  Handle<mirror::ByteArray> handle_array(hs.NewHandle(byte_array)); \
  return mirror::String::AllocFromByteArray<instrumented_bool>(self, byte_count, handle_array, \
                                                               offset, high, allocator_type); \
} \
extern "C" mirror::String* artAllocStringFromCharsFromCode##suffix##suffix2( \
    int32_t offset, int32_t char_count, mirror::CharArray* char_array, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  StackHandleScope<1> hs(self); \
  Handle<mirror::CharArray> handle_array(hs.NewHandle(char_array)); \
  return mirror::String::AllocFromCharArray<instrumented_bool>(self, char_count, handle_array, \
                                                               offset, allocator_type); \
} \
extern "C" mirror::String* artAllocStringFromStringFromCode##suffix##suffix2( /* NOLINT */ \
    mirror::String* string, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  StackHandleScope<1> hs(self); \
  Handle<mirror::String> handle_string(hs.NewHandle(string)); \
  return mirror::String::AllocFromString<instrumented_bool>(self, handle_string->GetLength(), \
                                                            handle_string, 0, allocator_type); \
}

#define GENERATE_ENTRYPOINTS_FOR_ALLOCATOR(suffix, allocator_type) \
    GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, Instrumented, true, allocator_type) \
    GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, , false, allocator_type)

The artAllocObjectFromCode() method (/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc) was finally called:

static constexpr bool kUseTlabFastPath = true;

template <bool kInitialized,
          bool kFinalize,
          bool kInstrumented,
          gc::AllocatorType allocator_type>
static ALWAYS_INLINE inline mirror::Object* artAllocObjectFromCode(
    mirror::Class* klass,
    Thread* self) REQUIRES_SHARED(Locks::mutator_lock_) {
  ScopedQuickEntrypointChecks sqec(self);
  DCHECK(klass != nullptr);
  if (kUseTlabFastPath && !kInstrumented && allocator_type == gc::kAllocatorTypeTLAB) {
    if (kInitialized || klass->IsInitialized()) {
      if (!kFinalize || !klass->IsFinalizable()) {
        size_t byte_count = klass->GetObjectSize();
        byte_count = RoundUp(byte_count, gc::space::BumpPointerSpace::kAlignment);
        mirror::Object* obj;
        if (LIKELY(byte_count < self->TlabSize())) {
          obj = self->AllocTlab(byte_count);
          DCHECK(obj != nullptr) << "AllocTlab can't fail";
          obj->SetClass(klass);
          if (kUseBakerReadBarrier) {
            obj->AssertReadBarrierState();
          }
          QuasiAtomic::ThreadFenceForConstructor();
          return obj;
        }
      }
    }
  }
  if (kInitialized) {
    return AllocObjectFromCodeInitialized<kInstrumented>(klass, self, allocator_type);
  } else if (!kFinalize) {
    return AllocObjectFromCodeResolved<kInstrumented>(klass, self, allocator_type);
  } else {
    return AllocObjectFromCode<kInstrumented>(klass, self, allocator_type);
  }
}

This method does the following:

  • First determine if you can use TLAB to allocate memory.TLAB is Android's use of Thread's local storage space to reduce synchronization between threads and speed up processing.If TLAB allocation is available, the AllocTlab() method of the Thread object is eventually called for memory allocation.

  • Next, the branching conditions are determined based on the values of the parameters kInitialized and kFinalize.If the class is already initialized, execute the AllocObjectFromCodeInitialized() method; otherwise, execute the AllocObjectFromCodeResolved() and AlocObjectFromCode() methods.

Let's look at the AllocObjectFromCodeResolved method (/art/runtime/entrypoints/entrypoint_utils-inl.h):

// Given the context of a calling Method and a resolved class, create an instance.
template <bool kInstrumented>
ALWAYS_INLINE
inline mirror::Object* AllocObjectFromCodeResolved(mirror::Class* klass,
                                                   Thread* self,
                                                   gc::AllocatorType allocator_type) {
  DCHECK(klass != nullptr);
  bool slow_path = false;
  klass = CheckClassInitializedForObjectAlloc(klass, self, &slow_path);
  if (UNLIKELY(slow_path)) {
    if (klass == nullptr) {
      return nullptr;
    }
    gc::Heap* heap = Runtime::Current()->GetHeap();
    // Pass in false since the object cannot be finalizable.
    // CheckClassInitializedForObjectAlloc can cause thread suspension which means we may now be
    // instrumented.
    return klass->Alloc</*kInstrumented*/true, false>(self, heap->GetCurrentAllocator()).Ptr();
  }
  // Pass in false since the object cannot be finalizable.
  return klass->Alloc<kInstrumented, false>(self, allocator_type).Ptr();
}

Determines whether the class needs to be parsed (the class is not loaded into the virtual machine), defaults to no, slow_path is false, and if it needs to be parsed, slow_path is true.CheckClassInitializedForObject Alloc returns the corresponding class of the object to be assigned.If klass is not null, memory allocation is made for objects of that class: the Aloc method of klass is called.

Alloc method: (/art/runtime/mirror/class-inl.h)

template<bool kIsInstrumented, bool kCheckAddFinalizer>
inline ObjPtr<Object> Class::Alloc(Thread* self, gc::AllocatorType allocator_type) {
  CheckObjectAlloc();
  gc::Heap* heap = Runtime::Current()->GetHeap();
  const bool add_finalizer = kCheckAddFinalizer && IsFinalizable();
  if (!kCheckAddFinalizer) {
    DCHECK(!IsFinalizable());
  }
  // Note that the this pointer may be invalidated after the allocation.
  ObjPtr<Object> obj =
      heap->AllocObjectWithAllocator<kIsInstrumented, false>(self,
                                                             this,
                                                             this->object_size_,
                                                             allocator_type,
                                                             VoidFunctor());
  if (add_finalizer && LIKELY(obj != nullptr)) {
    heap->AddFinalizerReference(self, &obj);
    if (UNLIKELY(self->IsExceptionPending())) {
      // Failed to allocate finalizer reference, it means that the whole allocation failed.
      obj = nullptr;
    }
  }
  return obj.Ptr();
}
  1. CheckObjectAlloc() method checks whether the object type is legal.

  2. To make a finalize-related judgment, if this class overrides the finalize () method, you need to call heap->AddFinalizerReference (self, &obj), generate a FinalizerReference object through FinalizerReference.java's add() method, and add it to a chain table structure.When an object is destroyed, the finalize() method that calls the object is executed.

  3. Call heap->AllocObjectWithAllocator to allocate memory for the object.

At this point, the object's memory allocation enters the heap heap heap related allocation phase, which we will cover in the next section.

Summary

  1. The Thread class initializes the jump table for external library function calls.These function jump entries are entered by accessing the offset corresponding to the thread Thread.

  2. Thread's InitTlsEntryPoints() method calls the InitEntryPoints() method and passes in the offset address.The implementation of this method varies depending on the cpu architecture of the device, such as the implementation of ARM 64/art/runtime/arch/arm64/entrypoints_init_arm64.cc.

  3. Entry_points_allocator represents the type of memory allocator, and an initial value of kAllocatorTypeDlMalloc indicates that the allocator entry for DlMalloc will be used.You can change the value of entry_points_allocator by calling SetQuickAllocEntryPoints Allocator.In most cases, the value entry_points_allocator is kAllocatorTypeRosAlloc.

  4. The artAllocObjectFromCode() method (/art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc) invokes memory allocation for different branching conditions based on conditions, such as whether a class needs to be parsed or not.

  5. Eventually, heap->AllocObjectWithAllocator is called to allocate the object's memory.

Posted by Coreye on Fri, 16 Aug 2019 23:05:42 -0700