Skip to content

Latest commit

 

History

History
529 lines (423 loc) · 19.7 KB

File metadata and controls

529 lines (423 loc) · 19.7 KB

** CommonAPI in Android VHAL **


Table of Contents

  1. Executive Summary
  2. Quick Start Guide
  3. System Architecture
  4. Problem Manifestation
  5. Root Cause Analysis
  6. Execution Timeline Analysis
  7. Technical Deep Dive
  8. Alternative Solutions
  9. Prevention Guidelines
  10. Appendix: Verification Commands
  11. Glossary

1. Executive Summary

This document analyzes a critical C++ dynamic linking issue encountered when integrating CommonAPI (SOME/IP) with FakeVHAL (Fake Vehicle Hardware Abstraction Layer) on Android 15 AOSP for Raspberry Pi 5.

Core Issue: C++ inline static member functions created duplicate singleton instances across the boundary between the main executable (android.hardware.automotive.vehicle@V3-default-service) and the shared library (libFakeVehicleHardware-V3.so). This caused CommonAPI's factory registration to bind to one instance while application logic used another, resulting in null stub/proxy creation and silent failures.

Resolution: Moved all CommonAPI lifecycle management to the main executable, implementing a bridge pattern to expose functionality to FakeVehicleHardware via static singleton accessors, ensuring single-instance semantics across the entire process.


2. Quick Start Guide

To integrate CommonAPI with FakeVHAL in your AOSP 15 build, follow these steps:

Step 1: Replace Build Configuration

Replace the Android.bp in your AOSP tree:

# Copy from this repository to your AOSP source
cp Android.bp $AOSP_ROOT/hardware/interfaces/automotive/vehicle/aidl/

Step 2: Integrate Source Files

Navigate to the implementation directory:

cd $AOSP_ROOT/hardware/interfaces/automotive/vehicle/aidl/impl/3

Copy the following components from this repository:

  1. Main Service (vhal/ directory):

    • Contains DefaultVehicleHal with CommonAPI initialization in the main executable context
    • Includes commonapi_bridge/ with stub/proxy management
    • Android.bp configured to link CommonAPI libraries to main service only
  2. Hardware Implementation (hardware/ directory):

    • FakeVehicleHardware implementation without CommonAPI headers in public interface
    • Uses bridge pattern to access CommonAPI objects through exported functions
    • Clean separation: hardware lib depends on service, not CommonAPI directly

Step 3: Configure Runtime Environment

Ensure vhal/vhal-default-service-v3.rc contains the required vsomeip environment variables:

export VSOMEIP_CONFIGURATION=/vendor/etc/vsomeip.json
export VSOMEIP_APPLICATION_NAME=VehicleHAL
export COMMONAPI_CONFIG=/vendor/etc/commonapi.ini

Step 4: Build

source build/envsetup.sh
lunch aosp_rpi5-userdebug
m android.hardware.automotive.vehicle@V3-default-service

Repository Structure Reference

CommonApiInVhal/
├── Android.bp                              # Root build file (copy to aidl/)
├── hardware/                               # Hardware implementation
│   ├── Android.bp
│   ├── commonapi_bridge/                   # CommonAPI generated code + stub impl
│   ├── include/                            # FakeVehicleHardware.h (no CommonAPI includes)
│   └── src/                                # Implementation files
└── vhal/                                   # Main service
    ├── Android.bp                          # Service build config with CommonAPI deps
    ├── commonapi_bridge/                   # CommonAPI bridge in service context
    ├── include/                            # Service headers
    ├── src/                                # DefaultVehicleHal + VehicleService
    └── vhal-default-service-v3.rc          # Init script with env vars

3. System Architecture

Components Involved

  • Platform: Raspberry Pi 5 running Android 15 AOSP
  • Vehicle HAL: android.hardware.automotive.vehicle@V3-default-service (main executable)
  • Hardware Implementation: libFakeVehicleHardware-V3.so (shared library)
  • Middleware: CommonAPI C++ with SOME/IP binding (libCommonAPI.so, libCommonAPI-SomeIP.so)
  • Communication: SendFromAospToYoctoStubImpl (service) and SendFromYoctoToAospProxy (client)

Initial (Failed) Architecture

Main Executable (android.hardware.automotive.vehicle@V3-default-service)
  ├─ Includes FakeVehicleHardware.h (which included CommonAPI.hpp)
  └─ dlopen() → libFakeVehicleHardware-V3.so
       ├─ Contains CommonAPI init code
       ├─ Calls CommonAPI::Runtime::get()  ← Creates instance #2
       └─ Creates Stub/Proxy objects        ← Fails (empty runtime)

Working Architecture

Main Executable (android.hardware.automotive.vehicle@V3-default-service)
  ├─ Includes FakeVehicleHardware.h (CLEAN - no CommonAPI)
  ├─ Contains CommonAPI init code
  ├─ Calls CommonAPI::Runtime::get()        ← Creates instance #1 (ONLY ONE)
  ├─ Creates Stub/Proxy objects             ← Success
  ├─ Stores in global static singleton
  └─ dlopen() → libFakeVehicleHardware-V3.so
       └─ Accesses stub/proxy via bridge     ← Uses instance #1

4. Problem Manifestation

4.1 Primary Symptoms

  1. Silent Initialization Failure: No stub or proxy objects created despite valid code logic
  2. Log Evaporation: Logs cleared after RPi5 boot due to ring buffer reset (red herring symptom)
  3. Runtime Null Returns: CommonAPI::Runtime::get() returned valid pointer but registerService() returned false or buildProxy() returned null
  4. No Error Messages: ALOGD (Debug level) logs filtered out in production builds; failure paths in CommonAPI returned nullptr without error logging

4.2 Diagnostic Evidence

Log Analysis:

# Search for CommonAPI showed NOTHING (filtered debug logs)
adb logcat | grep -i commonapi
^C  # Empty

# Search for init function showed nothing
adb logcat | grep -i initCommonAPIBridge
^C  # Empty

Process Verification:

# Service running but non-functional
adb shell ps | grep vehicle
vehicle_network 284 1 ... S android.hardware.automotive.vehicle@V3-default-service

# Library loaded but isolated
adb shell cat /proc/[pid]/maps | grep vehicle
/vendor/lib64/libFakeVehicleHardware-V3.so
/vendor/bin/hw/android.hardware.automotive.vehicle@V3-default-service

5. Root Cause Analysis

5.1 The Inline Function Trap

Definition: In C++, functions defined inside class definitions or marked inline in headers are copied (inlined) into every translation unit that includes the header.

CommonAPI's Implementation:

// In CommonAPI.hpp (simplified)
class Runtime {
public:
    static inline Runtime* get() {  // Inline function!
        static Runtime* instance = createRuntime();  // Static local variable
        return instance;
    }
};

Compilation Outcome:

  • FakeVehicleHardware.o: Contains machine code for Runtime::get() + storage for instance at offset 0x2000
  • Main.o: Contains machine code for Runtime::get() + storage for instance at offset 0x1000
  • Result: Two separate functions at different addresses with two separate static variables

5.2 Dynamic Shared Objects (DSOs) and Memory Layout

DSO Definition: Dynamic Shared Objects (.so files on Linux/Android) are loadable modules with independent memory spaces for static/data segments.

Memory Architecture:

Process Virtual Memory
├─ Main Executable (.text, .bss, .data)
│   └─ instance #1 @ 0x55a3f2c00800
│
└─ libFakeVehicleHardware-V3.so (.text, .bss, .data)
    └─ instance #2 @ 0x7f4a2a802150

ELF Sections:

  • .text: Code section (contains Runtime::get() machine code)
  • .bss: Uninitialized data section (contains static Runtime* instance set to NULL)
  • .ctors: Constructor section (runs static initializers on load)

5.3 Symbol Visibility and Linker Resolution

Symbol Types:

  • GLOBAL: Exported symbols visible to other DSOs
  • LOCAL: Private symbols visible only within the same DSO
  • WEAK: Overridable symbols

The Binding Problem: When libCommonAPI-SomeIP.so loads, its static constructor calls Runtime::get(). The dynamic linker searches:

  1. Global Scope → Finds Runtime::get in Main Executable (0x1000) ← BINDS HERE
  2. Skips local copies in other libraries

When code inside libFakeVehicleHardware.so calls Runtime::get(), it uses the local copy (0x2000) due to compiler-generated direct calls (not dynamic linking).

Result:

  • Factory registration occurs at address 0x1000 (Main Exe)
  • Application logic reads from address 0x2000 (FakeVehicleHardware)
  • Divergent state: One has factories, one doesn't

5.4 Static Initialization Order Fiasco

Initialization Sequence:

  1. Main executable loads → global constructors run
  2. dlopen("libFakeVehicleHardware-V3.so") triggers:
    • libCommonAPI.so loads
    • libCommonAPI-SomeIP.so loads → Its static ctor runs immediately
    • libFakeVehicleHardware.so loads → Global ctors run

Race Condition: If libCommonAPI-SomeIP.so's constructor runs before FakeVehicleHardware's code executes, it registers with whichever Runtime::get() is visible globally (Main Exe's copy).

The Loop Trap: Attempting to retry in a loop fails because:

for (int i = 0; i < 100; i++) {
    auto rt = CommonAPI::Runtime::get();  // Returns same cached instance
    if (rt) break;  // rt is valid pointer but to WRONG instance
    sleep(100ms);
}
// Static locals initialize only ONCE, then cache forever

5.5 CommonAPI Factory Registration Mechanism

Internal Architecture:

// Inside libCommonAPI-SomeIP.so
struct SomeIpBinding {
    SomeIpBinding() {
        // Automatic registration on library load
        CommonAPI::Runtime::get()->registerFactory("someip", this);
    }
};
static SomeIpBinding g_autoRegister;  // Global static object

The Factory Registry:

  • Runtime contains std::map<std::string, Factory*> factories_
  • registerService() looks up factory by binding name ("someip")
  • If no factory registered for that binding name, operations fail

6. Execution Timeline Analysis

6.1 Failure Scenario (Initial Architecture)

Time 0.0: Android init starts main executable

Load /vendor/bin/hw/android.hardware.automotive.vehicle@V3-default-service
  ├─ Parse ELF headers
  ├─ Map segments into memory
  └─ Run .ctors section (static constructors)
     └─ (None related to CommonAPI yet)
  Note: Header includes caused Runtime::get() code to exist at 0x1000
        and instance storage at 0x55a3f...

Time 1.0: HAL calls setCallback(), triggering dlopen

dlopen("libFakeVehicleHardware-V3.so", RTLD_NOW)
  ├─ Load dependencies:
  │   ├─ libCommonAPI.so (base runtime library)
  │   └─ libCommonAPI-SomeIP.so (SOME/IP binding)
  │       └─ Execute .ctors:
  │           SomeIpBinding::SomeIpBinding()
  │             └─ Runtime::get() → Resolves to Main Exe @ 0x1000
  │                 └─ Returns instance @ 0x55a3f... (empty, uninitialized)
  │             └─ registerFactory("someip", this) → Stored at Main Exe instance
  │
  └─ Load libFakeVehicleHardware-V3.so
      └─ Execute .ctors:
          FakeVehicleHardware::FakeVehicleHardware()
            └─ initCommonAPIBridge() [detached thread or delayed]
                └─ Runtime::get() → Uses LOCAL copy @ 0x2000
                    └─ Returns instance @ 0x7f4a... (different object!)
                └─ registerService("local", ...) 
                    └─ Checks local instance factories map → EMPTY
                    └─ Returns FALSE (no "someip" factory found)

Time 2.0: Application runs

FakeVehicleHardware tries to use stub/proxy
  └─ Objects are null or invalid
  └─ No ERROR logs (ALOGD filtered)
  └─ Silent failure

6.2 Success Scenario (Working Architecture)

Time 0.0: Main executable loads with CommonAPI code

Main executable contains:
  ├─ CommonAPI initialization code
  ├─ Explicit init function (not static constructor)
  └─ Single Runtime::get() copy at 0x1000

Time 1.0: HAL initialization

Explicit call to initCommonAPI() in main()
  ├─ libCommonAPI-SomeIP.so already loaded (dependency)
  ├─ Runtime::get() @ 0x1000 → Returns single instance
  ├─ libCommonAPI-SomeIP factory auto-registered to this instance
  ├─ registerService() succeeds (finds "someip" factory)
  ├─ buildProxy() succeeds
  └─ Store objects in global static variables

Time 2.0: Load FakeVehicleHardware

dlopen("libFakeVehicleHardware-V3.so")
  ├─ No CommonAPI code in this library (clean header)
  ├─ Library calls back to main exe via function pointers
  └─ Uses pre-created stub/proxy objects from global storage

7. Technical Deep Dive

7.1 Memory Address Visualization

Failed State Memory Map:

Address Space Layout (64-bit ARM):
0x000055a3f2c00000  Main Executable .bss section
       │
       ├─ 0x55a3f2c00800: instance #1 (Main Exe)
       │                  factories["someip"] = 0x7f4a... (valid)
       │                  [Data populated by SOME/IP lib]
       │
0x00007f4a2a800000  libFakeVehicleHardware-V3.so .bss section
       │
       ├─ 0x7f4a2a802150: instance #2 (Shared Lib)
       │                  factories = empty map
       │                  [Default initialized, never used by SOME/IP]
       │
0x00007f4a2b000000  libCommonAPI-SomeIP.so .text section
       │
       └─ SomeIpBinding ctor: Bound to Main Exe's Runtime::get

7.2 Why the Main Executable Had a Copy

Transitive Include Chain: Even though main.cpp never explicitly called Runtime::get(), it included:

// main.cpp
#include "FakeVehicleHardware.h"

// FakeVehicleHardware.h
#include <CommonAPI/CommonAPI.hpp>

Compilation Effect:

  • Preprocessor expanded CommonAPI.hpp into main.cpp translation unit
  • Compiler generated code for inline Runtime::get() in main.o
  • Linker included this code in final executable
  • Result: Main exe has its own .bss entry for instance variable

Symbol Table Evidence:

readelf -s android.hardware.automotive.vehicle@V3-default-service | grep Runtime
# Output: 0000000000023b10  45 FUNC  GLOBAL DEFAULT  12 _ZN7CommonAPI7Runtime3getEv
# The function EXISTS in the executable even if never called directly

7.3 Why Factories Registered in Wrong Instance

Dynamic Linker Symbol Resolution Order: When libCommonAPI-SomeIP.so loads, it has an unresolved symbol reference to CommonAPI::Runtime::get.

Resolution Algorithm:

  1. Check local scope (libCommonAPI-SomeIP.so itself) → Not found
  2. Check global scope (previously loaded DSOs):
    • Main executable is already in memory
    • Main executable exports Runtime::get (GLOBAL binding)
    • Match found at 0x1000 ← Binds here
  3. Stop searching (first match wins)

Alternative Binding: If main executable didn't have the symbol, it would have searched:

  • libCommonAPI.so → Found base implementation
  • Bound to shared copy

But because main exe did have it (due to header include), it "stole" the binding.


8. Alternative Solutions

While the primary solution (moving CommonAPI to main executable) was implemented, these alternatives exist:

Option A: PIMPL (Pointer to Implementation) If CommonAPI must stay in FakeVehicleHardware:

// FakeVehicleHardware.h
class CommonAPIBridge;  // Forward declaration only

class FakeVehicleHardware {
    std::unique_ptr<CommonAPIBridge> m_bridge;  // Opaque pointer
};

Move all CommonAPI code to CommonAPIBridge.cpp, ensuring main exe never includes headers.

Option B: Explicit Runtime Passing Don't rely on singleton:

// Main exe creates runtime
auto runtime = CommonAPI::Runtime::get();

// Pass explicitly to library
FakeVehicleHardware::init(runtime);

Option C: Visibility Attributes Force CommonAPI symbols to be hidden in main executable:

// In main.cpp, before includes
#pragma GCC visibility push(hidden)
#include <CommonAPI/CommonAPI.hpp>
#pragma GCC visibility pop

This prevents main exe from exporting Runtime::get, forcing binding to library copy.


9. Prevention Guidelines

For Android HAL Development

  1. Header Hygiene: Never include middleware headers (CommonAPI, DBus, etc.) in public HAL headers
  2. Forward Declarations: Use forward declarations and opaque pointers (void* or PIMPL)
  3. Singleton Ownership: Decide which DSO owns singleton lifecycle and enforce it
  4. Deferred Initialization: Never call complex init in global constructors; use explicit init() functions

For C++ Shared Library Design

  1. Avoid inline static in headers across library boundaries
  2. Use __attribute__((visibility("hidden"))) for internal symbols
  3. Explicit template instantiation: Don't rely on inline template functions for cross-DSO singletons
  4. Link-time verification: Check readelf -s to ensure symbols aren't duplicated

Debugging Checklist

When stub/proxy creation fails:

# 1. Verify symbol duplication
readelf -s libA.so | grep Runtime
readelf -s main_exe | grep Runtime

# 2. Check which libraries loaded
cat /proc/[pid]/maps | grep -E "(so$|exe$)"

# 3. Verify singleton addresses
# Add logs: ALOGE("Runtime instance: %p", Runtime::get());
# Compare addresses from different components

# 4. Check factory registration
# Add logs in binding library to confirm registration target address

10. Appendix: Verification Commands

# Check for duplicate symbols across binaries
readelf -s /vendor/bin/hw/android.hardware.automotive.vehicle@V3-default-service | c++filt | grep -i runtime
readelf -s /vendor/lib64/libFakeVehicleHardware-V3.so | c++filt | grep -i runtime

# Monitor library load order
adb shell strace -e dlopen,dlsym android.hardware.automotive.vehicle@V3-default-service 2>&1 | grep -i commonapi

# Check static initialization
adb shell dmesg -T | grep -i "init\|constructor"

# Verify memory layout
adb shell cat /proc/$(pidof android.hardware.automotive.vehicle@V3-default-service)/maps \
  | grep -E "(CommonAPI|vehicle)"

11. Glossary

Term Definition
DSO Dynamic Shared Object (.so file); loadable module with code and data
BSS Block Started by Symbol; section for uninitialized global/static variables
ELF Executable and Linkable Format; Linux/Android binary format
Inline Function Function defined in header where compiler inserts code directly at call sites
Linker Namespace Android's isolation mechanism for dynamic linking; prevents symbol conflicts
PIMPL Pointer to Implementation; opaque pointer pattern to hide implementation details
RTLD_GLOBAL dlopen flag making symbols available to subsequently loaded libraries
Singleton Design pattern ensuring only one instance of a class exists
Static Local Variable inside function scope with static storage duration (lives for program life)
Symbol Resolution Process of linking symbolic names to memory addresses at load time
VHAL Vehicle Hardware Abstraction Layer; Android Automotive HAL for vehicle data
Visibility ELF attribute controlling whether symbol is exported (GLOBAL) or hidden (LOCAL)

Document Status: Analysis Complete
Affected System: Android 15 AOSP on Raspberry Pi 5 with CommonAPI-SOME/IP
Resolution: Single-DSO ownership of CommonAPI lifecycle via Main Executable Bridge Pattern