Affective computing is an increasing interdisciplinary research field that provides great potential to recognize, understand and express human emotions. Recently, multimodal analysis starts to gain more popularity in affective studies, which could provide a more comprehensive view of emotion dynamics based on the diverse and complementary information from different data modalities. However, the stability and generalizability of current multimodal analysis methods have not been thoroughly developed yet. In this paper, we propose a novel multimodal analysis method (EEG-AVE EEG with audio-visual embedding) for cross-individual affective detection, where EEG signals are exploited to identify the emotion-related individual preferences and audio-visual information is leveraged to estimate the intrinsic emotions involved in the multimedia content. EEG-AVE is composed of two main modules. For EEG-based individual preferences prediction module, a multi-scale domain adversarial neural network is developed to explore the shared dynamic, informative, and domain-invariant EEG features across individuals. For video-based intrinsic emotions estimation module, a deep audio-visual feature-based hypergraph clustering method is proposed to examine the latent relationship between semantic audio-visual features and emotions. Through an embedding model, both estimated individual preferences and intrinsic emotions are incorporated with shared weights and further contribute to affective detection across individuals. Experiments on two well-known emotional databases indicate that the proposed EEG-AVE model achieves a better performance under a leave-one-individual-out cross-validation individual-independent evaluation protocol. The results demonstrate that EEG-AVE is an effective model with good reliability and generalizability, which has practical significance in the development of multimodal analysis in affective computing.