In this paper, a sparse representation method for human interaction is proposed. The trajectory feature embodying global changes is fused with spatio-temporal feature emphasizing local movement. Firstly, the sparse representation of the trajectory feature is obtained by the bag of words model. Then, multi-level spatio-temporal features are produced by three layered spatial-temporal pyramid and processed by sparse coding. Multi-scale maxpooling algorithm is employed to obtain the local sparse feature. Finally, two kinds of sparse features are weighted and connected to obtain the sparse representation of human interaction. The dynamic latent conditional random field model is employed to verify the proposed sparse representation and the experimental results demonstrate the effectiveness.