首页 > 代码库 > 动作识别之APJ3D和随机森林

动作识别之APJ3D和随机森林

Human Action Recognition Using APJ3D and Random Forests


方法概述:
 
First, we extract the 3D skeletal joint locations from depth images. The APJ3D computed from the action depth image sequences by employing the 3D joint position features and the 3D joint angle features, and then clustered into K-means algorithm, which represent the typical postures of actions. By employing the improved Fourier Temporal Pyramid, we recognize actions using random forests. 
 
通过从kinect的骨骼点信息,提取3D 点的位置特征和3D点的角度特征,并用二者构建新特征 —— APJ3D
手工选择15个关节点(能承受小的扰动)
从训练数据中提取出的APJ3D向量要通过Kmeans聚类,傅里叶时空金字塔,随机森林最后获得识别结果
 

动作识别的三大挑战:
First is description of human action. Human action in the video sequence is a dynamic process that characterized not only with each frame of the body posture, but also with these the emergence of gesture sequences and continuous time. And even with a type of action, different individuals at the completion of the action of the process will be different due to the different height, shape, agility and so on. Therefore, on human action identification process, how to quickly extract simple but effective features is still facing a great difficulty in human action recognition. Second is representation model of human action, the relatively large changes in human action, but also has a strong combination of structural features, and how to combine these characteristics, design a strong distinction between the ability of the action of the model is an important issue in human action recognition. Third is efficient action classification algorithm design, action recognition has a high data dimension, training data acquisition difficulties characteristics, we hope that the behavioral categories algorithm has the training and classification speed, good effect, generalization ability characteristics.
 

特征提取:
 
首先选择20个关节点:hip center, spine, shoulder center, head, L/ R shoulder, L/ R elbow, L/ R wrist, L/ R hand, L/ R hip, L/ R knee, L/ R angle and L/ R foot. 
Among these joints, hand and wrist and foot and ankle are very close to each other and thus superfluous for the characterization of body part constitution.
所以最终确定的15个关节点:head, neck, L/ R shoulder, L/ R elbow, L/ R hands, L/ R knee, L/ R feet, torso center and L/ R hip. 
从人面对kinect的方向,判断出左右肢体
 
节点角度
每个关节点有其几何位置(全局笛卡尔坐标系中)
The joints contiguous to the torso are usually called first-degree joints, while joints contiguous to firstdegree joints are classified as second-degree joints. Firstdegree joints include the elbows, the knees and the head, while second-degree joints are the extremities: the hands and feet. 
 
每一个关节点有两个自由度:a zenith angle θ and an azimuth angle μ (相连两点的距离保持不变)
角度信息的获取需要将每个joint的全局坐标转化成局部坐标 —— 论文没说清,我理解应该是,从torso basis 计算出坐标系的方向和尺度(正则化),进而计算出相互连接的第一度,第二度节点
 
 
节点位置
The pairwise relative positions of the joints results in more discriminative features for representing the human movement is our key suggestion. Due to the coordinates are normalized, so the motion is invariant to the absolute body position, the initial body orientation and the body size. 
 
For each joint i , we extract the pairwise relative position features by taking the difference between the position of joint i and that of each other joint j: 
The 3D joint feature for joint i is defined as: 
 
 
APJ3D
用同样的torso basis 来计算第一度节点
用旋转后的标准正交的 torso basis 的信息计算第二度节点:比如,定义右肩膀-右肘为V,定义右肘-右手为W,要获取右手的特征。首先旋转torso basis 这样,被旋转后的坐标基就移动到右肘上,然后定义球坐标系,
 
每一个节点对应球坐标系中的两个坐标,然后We also compute the angle η between the directional vector z from  the RGB-D sensor and the inverted vector  t ?from the torso basis, to detect torso inclinations. 最后的身体节点角度信息表示为:
 
Afterward, we select the pairwise relative position features as 
—— m the relative position between the torso center and the hands 
—— n the relative position between the torso center and the feet 
Thus, we use vector  to act as the features for action.
 
最后的APJ3D 特征信息:
 

 
傅里叶金字塔
 
we propose to use the improved Fourier Temporal Pyramid to represent the temporal dynamics of these frame-level features, and to solve the problem of temporal interval.
 
每个动作表现为APJ3D特征的连续变化序列,通过Kmeans聚类,每个动作被表示成一系列的 key postures
 
In order to capture the temporal structure of the action, apart from the global Fourier coefficients, we recursively partition the action into a pyramid, and use the short time Fourier transform for all the segments 。Thefinal feature is the concatenation of the Fourier coefficients from all the segments. 
改进方法如下:
For each key posture s, let  denote its overall feature vector where p is its 3D pairwise position vector and vis its 3D joint angle vector. 
Note that each element g is a function of time and we can write it as   . For each time segment at each pyramid level, we use Short Fourier Transform  to element  and acquire its Fourier coefficients, and we utilize its high-frequency and low-frequency coefficients  as features.
低频的特征可以保持对噪声的鲁棒,高频特征可表示动作的突变
经过傅里叶变换之后,对暂时扰动不再敏感because time series with temporal translation have the same Fourier coefficient magnitude, and the temporal structure of the actions can be characterized by the pyramid structure
 
实现中将动作分为4层金字塔
 

 
随机树训练
 
extract features from the training sets are trained with the random forests classifier, and assembled by a set of randomized decision trees. In each decision tree, W segment features are randomly selected from the training sets and put at a root node, and mapped to a set of termination leaf nodes by the interior binary splitting joints. 
 
At each interior joint,  f variables  are  randomly selected out of the Ffeature dimension and the decision threshold  T is correspondingly chosen in the range The splitting 
function is defined as:
 
 
To measure the training quality of each leaf node, the proportion of segments from sequences of a same action 
falling into the same leaf node, the information gain is defined at each split node: 
信息增益
 
In the testing stage, each segment feature is pushed to the root node of each decision tree in the random forests classifier, and eventually forwarded to a terminating leaf node. The path between a root node and a terminating leaf node consists of a set of split nodes, and each split node contains a binary splitting function. 
 
When the segment feature drops into a terminating leaf node, a histogram  Prefers to the proportion of segments per class label that fall into this leaf node during training stage, which is the soft voting result at the decision tree Finally, the prediction histogram of the whole forests is acquired by summing up the voting histograms from all the decision trees: 
 
因为加入了傅里叶变换,整个识别系统的抗噪声能力是杠杠滴~~
 


 
http://ojs.academypublisher.com/index.php/jsw/article/view/jsw080922382245
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



来自为知笔记(Wiz)