Audio-visual aesthetic teaching methods in college students’ vocal music teaching by deep learning

Experimental materials
Retrieval data based on STEAM education
Drawing upon the antecedent literature research, the keyword for retrieval is “STEM/STEAM education.” Subsequently, the assembled corpus of relevant scholarly articles on the identified subject is presented in Fig. 4.

Annual statistics of publications related to STEAM education.
Figure 4 highlights a discernible surge in the volume of scholarly papers concerning STEAM education, particularly notable following the announcement of the STEM education law in the United States in 2015. The United States remains at the forefront of exploring the dimensions of STEAM education, evidenced by its expanding coverage of related research topics. The integration of STEAM education is now inseparably linked to keywords such as educational philosophy, educational literacy, and interdisciplinary studies. The data underscores the steady progression of STEAM education as a prominent and widely accepted pedagogical approach, signifying its ascent into the realm of mainstream educational methodologies duly acknowledged and embraced by educational practitioners.
Comparison of old and new teaching concepts
This research conducts a one-week teaching comparison experiment in art appreciation To assess the advantages of the aesthetic teaching method based on STEAM education and deep learning compared to traditional teaching. Two classes, Classes A and B, from University Z, are selected for this experiment. Class A adopts a curriculum that is developed and enhanced through the lens of STEAM education, with the teacher designing specific content for the three-stage teaching process based on the strategies and process model of deep learning. On the other hand, Class B follows the conventional audition aesthetic teaching methods typically employed in colleges and universities. After the teaching sessions are completed, a comprehensive questionnaire is administered to gauge the efficacy of the two teaching methodologies. Subsequently, a comparative analysis is conducted to discern the impact of each approach. Specifically, the statistical outcomes pertaining to the course content of Class A are presented in Fig. 5, while those of Class B are depicted in Fig. 6.

Scoring of class A course content evaluation items.

Scoring of class B course content evaluation items.
As shown in Fig. 6, Class A demonstrates a higher average score than Class B. This observation suggests that incorporating the three-stage educational process model, guided by the integration of STEAM and deep learning principles, substantially improves the overall teaching effectiveness. Most students demonstrate improved learning efficiency through this novel teaching approach, which not only fosters their audio-visual aesthetic proficiency but also bolsters their self-confidence in the learning process.
Regarding course performance assessment, three key aspects are considered for scoring: group mutual evaluation, teacher evaluation, and self-evaluation. The statistical results for Classes A and B, employing a scoring criterion of 3 points per item, are presented in Figs. 7 and 8, respectively.

Scoring of class A course performance evaluation items.

Scoring of class B course performance evaluation items.
As illustrated in Fig. 8, within the course performance segment, each class is organized into four groups, each comprising 10 students. Students in Class A exhibit a heightened capacity to accurately reflect upon their own perspectives, effectively summarize learning outcomes, and adeptly address learning challenges. Remarkably, 55% of Class A students articulate the presence of clear learning objectives and methods. They demonstrate adaptability in modifying their learning approaches based on task requirements and manifest a lucid grasp of their learning achievements. During peer evaluations within their groups, 60% of Class A students acknowledge the ability of group members to accommodate differing viewpoints, effectively fulfilling their individual responsibilities while fostering mutual coordination among group members, thereby enhancing overall learning efficiency. Compared to Class B, Class A students demonstrate a more precise ability to position works according to their appreciation, along with a more comprehensive understanding of emotional expression and specialized knowledge inherent in the works. Their application of aesthetic principles extends beyond the confines of the course to offer novel perspectives on appreciating works from the standpoint of other disciplines. In terms of teaching design, STEAM education-based courses revolve predominantly around the work’s central theme, emphasizing the process’s logical progression and its interconnectedness with other disciplines. In contrast, traditional audio-visual aesthetic education primarily centers on the course itself, potentially deviating from actual aesthetic demands and possessing certain limitations.
The three-stage education model, which amalgamates STEAM education with deep learning, exhibits a profound consideration of students’ learning conditions while harnessing the potent analytical capabilities of computers in education. This leads to a notable enhancement in teaching efficiency and flexibility. Notably, STEAM education implementation in colleges and universities has garnered high acclaim from educators, underscoring the current national focus on students’ comprehensive development and the promotion of interdisciplinary learning initiatives.
Research methodology and curriculum evaluation
This experiment used a comparative research approach to evaluate the quality of audio-visual aesthetics education. Purposeful sampling techniques were utilized to intentionally select two equivalently sized classes, ensuring a representative comparison. Consent was obtained from both students and teachers to ensure the legitimacy and ethicality of the research process. Surveys were conducted to collect feedback from students and teachers, and the gathered data were subsequently subjected to analysis and evaluation. Statistical methods were employed to assess the reliability of the collected questionnaire data. The participants are college students from different cultural backgrounds, covering a variety of majors and learning experiences. The participants are students from urban and rural areas, aged between 18 and 25. Most of the participants have received higher education and have a certain foundation in music and visual arts. The diversity of these participants enhances the applicability of the research results in different educational environments. In addition, the data collection process pays special attention to the participants’ backgrounds in audio-visual aesthetic education to ensure that the collected data can reflect their learning experiences in their respective cultural and educational environments.
The Art Appreciation Elective offered by Z University serves as the foundation for this comparative study. Two equally sized classes, denoted as Classes A and B, are purposefully selected for the teaching comparison. Their courses are primarily centered on music appreciation, with the addition of video appreciation to evaluate the audio-visual aesthetic teaching quality. The selected works for appreciation encompass traditional musical compositions such as A Parting Tune with A Thrice Repeated Refrain and High Mountains and Flowing Water. The instructional duration spans one week, encompassing three distinct stages: pre-learning, interactive learning, and extension learning. Throughout the course, various audio-visual materials are available to enrich the learning experience. Upon the week-long teaching sessions’ conclusion, students and teachers are provided with pertinent questionnaires to gather their feedback and insights. These questionnaires are subsequently collected for analysis and evaluation.
Regarding the course content, the enhanced curriculum integrating deep learning and STEAM education encompasses a pre-learning phase wherein students actively engage in learning and comprehension. Subsequently, the teaching effectiveness is evaluated based on three key aspects: aesthetic understanding, artistic perception, and cultural interpretation, each scored according to the instructor’s assessment. Three components of course performance evaluation are appraised: self-assessment, group peer assessment, and teacher evaluation, each contributing to the evaluation of teaching quality. The specific scoring criteria and details for these evaluations are provided in Table 2.
Each item in the evaluation is scored out of 15 points, with the following grading scale: (1) 1–5 points: average performance; (2) 6–10 points: good performance; (2) 11–15 points: excellent performance. Course performance is evaluated within the context of cooperative learning, encompassing three critical aspects: self-evaluation, group mutual evaluation, and teacher evaluation. The detailed scoring criteria for each of these aspects are outlined in Table 3.
The evaluation process involves scoring and assessing three specific aspects of course performance based on designated teaching indicators. Following this assessment, the collected questionnaires are subjected to statistical analysis.
Questionnaire reliability analysis
The analysis of the questionnaire reliability treats each individual questions within the questionnaire as equivalent. The reliability equation can be conceptualized as the ratio between the signal variance and the total variance, as expressed in Eq. (1) and Eq. (2).
$$\:S_x^2=S_T^2+S_E^2$$
(1)
$$\:r=\fracS_T^2S_X^2$$
(2)
In Eq. (1) and Eq. (2), the subscripts x, r, and E denote the variances of score, signal, and noise, respectively. The reliability coefficient, denoted by r, indicates the extent of relevance among the questions within the questionnaire. Its value is directly proportional to the level of consistency among the questions. If the questionnaire comprises k equivalent questions, the score variance can be represented by a matrix, as exemplified in Eq. (3).
$$\:c = \left[ \beginarray*20c\sigma _1,1^2&\: \cdots \:&\:\sigma _1,k \\ \: \vdots &\: \ddots \:&\: \vdots \\ \:\sigma _k,1&\: \cdots \:&\:\sigma _k,k^2 \endarray \right]$$
(3)
The sum of the diagonal elements in the matrix is denoted by \(\:\sum\:\sigma\:_i^2\). The derivation of the reliability coefficient is presented in Eq. (4).
$$\:1 – \frac\sum \sigma _i^2 \sigma _Y^2$$
(4)
To ensure self-consistency of the equation, Eq. (4) is multiplied by \(\:\frackk-1\), resulting in the general expression for Cronbach’s α, as shown in Eq. (5):
$$\:\alpha \: = \frackk – 1\left( {1 – \frac\sum \sigma _i^2 \sigma _Y^2} \right)$$
(5)
The reliability analysis, employing Cronbach’s α as the measure, primarily relies on covariance. This analysis entails multiplying the variance of each questionnaire question through the available variance value, resulting in a product of k, as depicted in Eq. (6).
$$\:\sum \sigma _Y^2 = k\mathop v\limits^ – $$
(6)
Similarly, the sum of covariances can also be expressed by \(\:\stackrel-c\), as shown in Eq. (7).
$$\:\sigma _Y^2 = k\mathop v\limits^ – + (k^2 – k)\mathop c\limits^ -$$
(7)
Then, Eq. (8) is derived.
$$\:\alpha \: = \frackk – 1\left( {1 – \frack\mathop v\limits^ – k\mathop v\limits^ – + k\left( k – 1 \right)\mathop c\limits^ – } \right)$$
(8)
Equation (8) can be simplified as Eq. (9).
$$\:\alpha\:=\frack\stackrel-c\stackrel-v+\left(k-1\right)\stackrel-c$$
(9)
The variance and mean of Eq. (9) are converted into correlations among questionnaire items, leading to the mathematical representation of the reliability analysis, as depicted in Eq. (10).
$$\:\alpha\:=\frack\stackrel-r1+\left(k-1\right)\stackrel-r$$
(10)
The reliability analysis employs the measure of Cronbach’s α reliability coefficient, as illustrated in Fig. 9.

Cronbach’s α reliability evaluation criteria.
Figure 9 represents the confidence level, with a reliability coefficient exceeding 0.7 considered reliable according to established standards. Consistent with the principles and measurement criteria for reliability testing, the analysis of the questionnaire’s reliability is presented in Table 4.
In Table 4, the overall reliability score of the questionnaire reaches 0.736, indicating credible results.
Specific applications of deep learning technology in audio-video aesthetic teaching
In instructional design, deep learning technology is mainly applied to audio-video aesthetic education in the following ways:
-
(1)
Emotion recognition: The convolutional neural network in deep learning is utilized to analyze students’ facial expressions and emotional responses when watching videos and listening to music during the learning process. Through emotion recognition technology, teachers can capture students’ emotional changes in real time, such as excitement, calmness, and concentration, and thus adjust teaching strategies and provide personalized guidance according to students’ emotional feedback.
-
(2)
Audio analysis: This research adopts the time-frequency image analysis technology in deep learning (such as short-time Fourier transform combined with deep learning models) to analyze the audio data of musical works. It intends to help students understand the rhythm, melody, and emotional expression in music more deeply. This technology can automatically generate music score analysis and emotional features, enhancing students’ music perception ability.
-
(3)
Visual data processing: In the video appreciation session, deep learning analyzes elements such as color, composition, and movement trajectory in videos to help students feel the beauty of artworks more comprehensively and visually. This not only helps improve students’ visual aesthetic ability but also enables students to associate and compare the visual elements in audio-video works with musical expressions.
Through the introduction of these technologies, the teaching process can more efficiently and accurately feedback students’ learning status and help students achieve significant improvement in emotional expression and artistic perception ability.
Challenges and limitations of deep learning in music education
As a core technology in the field of artificial intelligence, deep learning has shown great potential in art education in recent years. In addition to its application in college vocal music teaching, it can also play an important role in other fields of art education and provide new possibilities for interdisciplinary research. In art education, deep learning can be used in aspects such as image generation, style transfer, and art authentication. Through generative adversarial networks, students can blend their own creations with the styles of famous artists to create works with unique artistic styles. This not only stimulates students’ creative enthusiasm but also deepens their understanding of different art schools and techniques. Meanwhile, deep learning technology can also help authenticate the authenticity of artworks and provide technical support for the art market and art history research. In dance education, deep learning can be used for motion capture and posture analysis to help students correct dance movements and improve expressiveness. By recording students’ dance movements with camera equipment, deep learning models can analyze and provide feedback in real time, point out deficiencies in movements, and provide personalized practice suggestions. This technology not only improves teaching efficiency but also enhances students’ learning initiative. In the fields of drama and film education, deep learning can be used for emotion recognition and character analysis. Through the analysis of performers’ facial expressions and voices, deep learning models can evaluate the accuracy of their emotional expressions and help students better understand the psychology of characters. Moreover, deep learning can also be used in scriptwriting to assist screenwriters in generating plot developments and dialogues and stimulate creative inspiration. Combining deep learning with STEAM education helps cultivate students’ interdisciplinary thinking and comprehensive abilities. The advanced technical means provided by deep learning can enrich the teaching content and forms of STEAM education. For example, in music education, using deep learning technology for audio analysis and generation can help students understand music structure and creative principles more deeply; in engineering education, combining deep learning for music production and arrangement can cultivate students’ technical application ability and artistic creation ability. This integration not only enriches teaching content but also cultivates students’ comprehensive qualities and innovation abilities. With the continuous development of deep learning technology, its application prospects in art education will be even broader. Future interdisciplinary research can further explore the combination methods of deep learning and different art forms and develop more intelligent tools and platforms suitable for education. In addition, through interdisciplinary collaborative research, deep integration of technology and art can be achieved, promoting innovation and the development of education models.
Although deep learning technology has shown significant advantages in audio-video aesthetic teaching, there are still some challenges and limitations in specific applications:
-
(1)
Demand for hardware resources: The operation of deep learning algorithms usually requires strong computing power and a large amount of data support. However, many colleges and universities may face problems such as insufficient hardware equipment or imperfect infrastructure. This limits the widespread application of deep learning technology to a certain extent.
-
(2)
Technical ability requirements: For front-line teachers, using deep learning technology for instructional design and implementation may require a certain technical background or professional training. Teachers need to master basic programming skills and the usage methods of deep learning tools, which undoubtedly increases the complexity of teaching implementation.
-
(3)
Data acquisition and privacy issues: Deep learning models need a large amount of student data (such as emotional responses and audio responses) for training and optimization, which involves data collection, storage, and privacy protection. In educational institutions, how to collect and use these data legally and compliantly is an aspect that requires special attention.
-
(4)
Teaching resources and costs: The introduction of deep learning technology may mean the need to purchase specialized teaching software, training datasets, and corresponding hardware equipment. These resources may not be easily accessible in some educational institutions. In particular, schools with limited resources may find it difficult to implement this innovative teaching model.
Through the discussion of these challenges, it is hoped to provide valuable references for other researchers and educators when using deep learning technology and promote the reasonable and effective application of this technology in education.
Discussion
According to Harlen (2016), the development of a curriculum grounded in the principles of STEAM is feasible and emphasizes the need to enhance students’ potential and foster their comprehensive abilities, aligning with the viewpoint espoused in this research32. Li (2016) employed diverse evaluation methods in teaching development, combining STEAM education with maker education, and introduced novel evaluation approaches to drive educational advancement33. In line with these perspectives, this research adopts STEAM education to propose a novel teaching model, subsequently evaluating and comparing its efficacy against the traditional teaching approach. The research findings affirm the superior efficiency of the new teaching model.
Additionally, Hsiao and Su (2021) integrated the concept of sustainable development into VR-assisted STEAM education, providing students with comprehensive inter-disciplinary STEAM education. The research findings show that the combination of STEAM education and VR-assisted experiential courses improve students’ learning satisfaction and outcomes while motivating their learning drive34. Consistent with the results of this research, both studies demonstrate that STEAM education can enhance students’ interest in learning. Ozkan and Umdu Topsakal (2021) investigated the effectiveness of STEAM education in cultivating the conceptual understanding of force and energy topics among 13-14-year-old students. They conducted experiments with 7th-grade students in experimental and control groups. The results indicated that STEAM education positively influenced students’ conceptual understanding, reducing or transferring misconceptions. Moreover, the experimental group’s post-test conceptual understanding scores were significantly higher than those of the control group35. Their experiment, like this research, demonstrated that STEAM education can strengthen students’ comprehension of classroom knowledge and enhance their learning abilities. Mun (2022) explored the role of aesthetic experiences in the learning process of integrating art, science, and technology. They studied students’ experiences of creating interactive art in the context of STEAM education in South Korea. They found that through this learning experience, students recognized the limitations of their thoughts and changed their ideas by applying new scientific knowledge and skills36. This result indicated an improvement in students’ comprehensive abilities, which aligns with the results of this research. Utomo et al. aimed to assess the effectiveness of a STEAM biotechnology module incorporating Flash animations in high school biology instruction. The results indicated the validation of the module, with highly positive student responses. Effectiveness test results demonstrated significant learning progress among students who used the module37. These findings align with this research, suggesting that interdisciplinary STEAM approaches and multimedia teaching tools can enhance the effectiveness of biology education. Zheng initiated their study with an English audio-visual oral course for a 2022 law major class, using 50 original Disney English movies as core teaching materials. They further applied STEAM educational principles to enhance students’ listening and speaking skills38. This research shared commonalities with the current study, utilizing movies for audio-visual aesthetics education and integrating STEAM education principles.
The educational model adopted in this research places students at the forefront to harness their intrinsic motivation and foster their creative capacity through a dynamic three-stage deep learning process centered around the works. The overarching goal is to cultivate inter-disciplinary abilities among students. In light of the contemporary context and the rapid advancement of science and technology, it is imperative to continuously innovate teaching methods. Traditional modes of education, wherein teachers solely dictate information to students, fail to stimulate innovative thinking among students. Therefore, fostering students’ ability for scientific exploration necessitates a confluence of theory with practical application, allowing theoretical knowledge to evolve and flourish through hands-on exploration.
In practical teaching scenarios, the innovative curriculum design based on STEAM education and deep learning, along with the three-stage teaching process model, can effectively help students enhance their music and visual appreciation abilities and promote comprehensive qualities, fostering well-rounded development in aesthetic perspectives, emotional expression, and overall abilities. As a contemporary teaching concept, STEAM education prioritizes students’ comprehensive development and aligns with the requirements of the era of economic globalization. Combining the STEAM education concept with audio-visual aesthetic teaching provides an organic way to integrate scientific knowledge with emotional expression in the instructional approach. This novel teaching model can be promoted in higher education, positively impacting teaching quality and promoting students’ comprehensive development. The research’s evaluation of teaching effectiveness through questionnaire surveys demonstrates the superiority of the new educational model. This finding indicates that the new curriculum design concept and teaching process model can effectively enhance students’ learning outcomes and comprehensive abilities, holding significant practical significance in cultivating well-rounded college students.
link