android - Producing optimised NDK code for multiple architectures? -
i have c code android lots of low-level number crunching. i'd know settings should use (e.g. android.mk , application.mk) files code produced run on current android devices takes advantage of optimisations specific chipsets. i'm looking default android.mk , application.mk settings use , want avoid having litter c code #ifdef branches.
for example, i'm aware armv7 has floating point instructions , armv7 chips support neon instructions , default arm supports neither of these. possible set flags can build armv7 neon, armv7 without neon , default arm build? i'm know how latter 2 not 3. i'm cautious settings use assume current defaults safest settings , risks other options have.
for gcc specific optimisation, i'm using following flags:
local_cflags=-ffast-math -o3 -funroll-loops
i've checked 3 of these speed code. there other common ones add?
another tip have add "local_arm_mode := arm" android.mk enable speed on newer arm chips (although i'm confused @ , happens on older chips).
arm processors have 2 general instruction sets support: "arm" , "thumb". though there different flavors of both, arm instructions 32 bits each , thumb instructions 16 bits. main difference between 2 arm instructions have possibility more in single instruction thumb can. example single arm instruction can add 1 register register, while performing left shift on second register. in thumb 1 instruction have shift, second instruction addition.
arm instructions not twice good, in cases can faster. true in hand-rolled arm assembly, can tuned in novel ways make best use of "shifts free". thumb instructions have own advantage size: drain battery less.
anyway, local_arm_mode - means compile code arm instructions instead of thumb instructions. compiling thumb default in ndk tends create smaller binary , speed difference not noticeable code. compiler can't take advantage of "oomph" arm can provide, end needing more or less same number of instructions anyway.
the result of see c/c++ code compiled arm or thumb identical (barring compiler bugs).
this compatible between new , old arm processors android phones available today. because default ndk compiles "application binary interface" arm-based cpus support armv5te instruction set. abi known "armeabi" , can explicitly set in application.mk putting app_abi := armeabi
.
newer processors support android-specific abi known armeabi-v7a
, extends armeabi add thumb-2 instruction set , hardware floating point instruction set called vfpv3-d16. armeabi-v7a compatible cpus can optionally support neon instruction set, have check @ run time , provide code paths when available , when not. there's example in ndk/samples directory (hello-neon). under hood, thumb-2 more "arm-like" in instructions can more in single instruction, while having advantage of still taking less space.
in order compile "fat binary" contains both armeabi , armeabi-v7a libraries add following application.mk:
app_abi := armeabi armeabi-v7a
when .apk file installed, android package manager installs best library device. on older platforms install armeabi library, , on newer devices armeabi-v7a one.
if want test cpu features @ run time can use ndk function uint64_t android_getcpufeatures()
features supported processor. returns bit-flag of android_cpu_arm_feature_armv7
on v7a processors, android_cpu_arm_feature_vfpv3
if hardware floating points supported , android_cpu_arm_feature_neon
if advanced simd instructions supported. arm can't have neon without vfpv3.
in summary: default, programs compatible. using local_arm_mode may make things faster @ expense of battery life due use of arm instructions - , compatible default set-up. adding app_abi := armeabi armeabi-v7a
line have improved performance on newer devices, remain compatible older ones, .apk file larger (due having 2 libraries). in order use neon instructions, need write special code detects capabilities of cpu @ run time, , applies newer devices can run armeabi-v7a.
Comments
Post a Comment