Interactive learning system neural network algorithm optimization

Table of Contents

Neural network model

Experiment: Comparison of the Siamese LSTM model with different combinations of training components, different embedding files (Word2Vec or fastText), different similarity functions (cosine distance or Manhattan distance), and with or without an attention mechanism.

In the evaluation of the experimental results, this design used the training data accuracy rate (ACC) after 25 epochs, recall rate (Recall), precision rate (Precision), and harmonic average value of the precision and recall rates (F1) to evaluate effectiveness. The specific calculation methods for the four evaluation indicators are as follows:

$$\:ACC=\frac{TP+TN}{TP+FP+TN+FN}$$

$$\:Recall=\frac{TP}{TP+FN}$$

$$\:Precision=\frac{TP}{TP+FP}$$

$$\:F1=\frac{2Precision\cdot\:Recall}{Precision+Recall}$$

Among the formulas, the specific meaning of each parameter is as follows:

TP (true positive): judged as a positive sample, and the judgment is correct;

TN (true negative): judged as a negative sample, and the judgment is correct.

FP (false positive): judged as a positive sample but is a negative sample.

FN (false negative): judged as a negative sample but is a positive sample.

Table 2 Data evaluation results.

As listed in Table 2, the Siamese LSTM network has an attention layer that selects Word2Vec for embedding and selects the Manhattan distance as a similarity function that outperforms the other combination. These data confirmed the superiority of the model combination.

Moreover, we can determine whether the use of the attention mechanism significantly widens the differences in the accuracy of the models. Given the same question similarity calculation function and embedding file, the accuracy of the model using the attention mechanism was estimated to be approximately 9% higher than that of the model without the attention mechanism. This further confirms the correctness of our decision to add the attention layers.

In the same table, we can still see that selecting different same-question similarity calculation functions can also affect the final accuracy. Given the same embedding file and the same situation of applying the attention mechanism, the accuracy of the model using the Manhattan measure was over 1% higher than that of the model using the cosine function.

The new Quora dataset was introduced as a Kaggle competition dataset in 2017 and has since been utilized extensively by researchers. This dataset is a filtered and preprocessed subset of the Quora Question Pairs dataset. In 2019, Lakshay Sharma et al. established a baseline using fundamental models and explored various approaches, including tree-based models, Continuous Bag of Words (CBOW) neural networks, and Long Short-Term Memory (LSTM) recurrent neural networks²⁴. We adopted their evaluation methodology, incorporating modifications such as using the GloVe vectors to get word embeddings and using Dropout for regularization. It is noteworthy that Sharma’s team identified some annotation errors or ambiguities in the source data. Given the limited number of affected entries, we chose to retain the original content without modification.

Table 3 Test set accuracy and f-score for new Quora dataset.

As listed in Table 3, our model (Siamese LSTM + Attention) achieved commendable accuracy and f-score in conjunction with Sharma’s testing results.

In the most recent evaluation, we compared the test results of similar models from studies conducted between 2017 and 2022²⁵. Our results can be summarized as “Siamese LSTM with Manhattan Difference + Attention”, it demonstrated a slight accuracy advantage, as illustrated in Table 4.

Table 4 Test results for similar models(All Quora Dataset).

Furthermore, the comparative analysis indicates that the Siamese LSTM baseline model performed robustly in this context. Incorporating an attention mechanism also achieved significantly higher accuracy (91.6%) than all baselines: exceeding the standard LSTM by 13.2% points (78.4%) and surpassing the closest competitor (Siamese LSTM with Square Difference and Addition, 89.1%) by 2.5% points. Additionally, different distance metrics had a minor yet discernible impact on the overall model performance.

We have also conducted tests and comparisons on other data sets in the early stage. For the convenience of comparison, our model structure can be summarized as “MaLSTM features + Attention Mechanism.”

Table 5 Test set accuracy for semeval task 1 data.

Reviewing past research, Lien and Kouylekov used the methodology of textual entailment in conjunction with graph-structured meaning representations, advanced semantic technologies, and formal reasoning tools to elevate their system’s metrics to the forefront among comparable general-purpose semantic parsing systems²⁶. Bowman et al. introduced the Stanford Natural Language Inference Corpus, a freely accessible collection of 570k annotated sentence pairs. They discovered that the information learned by neural network models trained on this corpus could be leveraged to improve performance on standard datasets²⁷. Bjerva et al. adopted a supervised approach to develop a semantic similarity system. It is worth noting that the random forest (RF) regressor is used to determine the similarity between sentence pairs, achieving an accuracy rate of approximately 82%²⁹. Zhao et al. developed five advanced systems, each using an identical feature set but using distinct classifiers or regressors, including a support vector machine (SVM), RF, gradient boosting (GB), K-nearest neighbors (KNN), and stochastic gradient descent (SGD). These systems demonstrate favorable outcomes in the SemEval Task, achieving an accuracy rate approaching 84%, highlighting the effectiveness of their ECNU models³¹.

Mueller and Thyagarajan proposed a Siamese adaptation of an (LSTM network for data labeling. Combined with an SVM, this novel model surpasses the performance of all previously implemented and more intricate neural network systems. Given that this model relies on pretrained word embeddings as inputs to the LSTM, the authors anticipate further improvements in prediction accuracy with the expansion of pretrained word-embedding datasets¹⁸.

This overview traces the progression from general-purpose semantic parsing techniques to the application of supervised learning with RF through experimentation with neural network models. It then delves into the comparative analysis of SVM, RF, GB, KNN, and SGD, culminating in the adoption of LSTM and the innovative MaLSTM model, which marks a steady improvement in accuracy. Table 5 shows that our method outperforms all previous methods when training the SemEval Task 1 data.

SemEval is a series of international NLP research workshops. SemEval Task 1 data are a normal sentence pair. However, whether the dataset consists of questions or not affects the training and similarity calculations. The data fields include sentence pair ID, sentence A, sentence B, semantic relatedness gold label (on a 1–5 continuous scale), and textual entailment gold label (NEUTRAL, ENTAILMENT, or CONTRADICTION). Table 6 shows 3 rows of sample SemEval Task 1 data.

Table 6 SemEval task 1 data example.

SemEval Task 1 data just require some minor adjustments to be used in our model, listed as follows:

− adding sequence number format based on our question pairs format, “id” start from 0 and inserting two new columns “qid1” and “qid2”;

− deleting the “relatedness” column;

− changing the “entailment judgment” column to “is duplicate”;

− replace all “ENTAILMENT” as “1,” “NEUTRAL” and “CONTRADICTION” as “0”;

− convert the entire “txt” file to “csv” format.

The adaptation of the SemEval Task 1 dataset was undertaken to ensure alignment with the core objective of this study—namely, the detection of semantically equivalent (i.e., duplicate) question pairs. To achieve this, the original three-way entailment classification (ENTAILMENT, NEUTRAL, CONTRADICTION) was transformed into a binary classification scheme. This binary conversion facilitated a consistent evaluation framework across all datasets used in this study (i.e., Quora, Stack Overflow, and SemEval), allowing the application of a unified model architecture without task-specific modifications. It is acknowledged that direct performance comparisons with models originally developed for full natural language inference classification tasks may introduce limitations in fairness due to differences in task formulation. To address this issue, we restricted our comparative analysis to baseline systems from prior studies that employed either semantic similarity scoring or binary entailment evaluation on the SemEval dataset. As presented in Table 5, the selected reference models (e.g., MaLSTM + SVM, ECNU, LangPro) were chosen specifically for their methodological alignment with our binary classification setting, thereby ensuring greater comparability and interpretive validity of the results.

The adjusted data format is shown in Table 7.

Table 7 SemEval task 1 data example after processing.

Our training data comprise question pairs, and each question has its own “id” tag. In this case, the answer program can be modified from the Siamese LSTM model. All questions from our training data can be saved as raw alignment data to establish a corpus. Including the online question-answering systems for college question pairs and SemEval Task 1 data. The saved content is only the “id” and the vector representations rendered by the Siamese LSTM model. When we enter a new question and want to detect the questions that are duplicated with this new one, we only compare the vector representations of questions, calculate the similarity between them, and finally use the “id” tag to query the question content.

The basic Siamese LSTM splits the data into left and right parts. After loading all the information from the corpus, we built the left-part inputs. The converted question was entered into vector form using the method defined in the Siamese LSTM model. Corresponding to each left input, the vector form of the new question was filled in for the right input. We compared similarity, which is equivalent to calculating the similarity between a new question and each question in the corpus.

Finally, the “id” corresponding to the question with higher similarity is returned to the user. Even multiple “id” values can be returned, ranging from high to low based on similarity. This kind of Q&A search system allows users to ask questions in natural language and accurately and quickly find answers to questions from a large amount of data. Completely new questions will be posted on the platform, awaiting responses from instructors. It is noteworthy that answers from students with strong knowledge and skills may also be included as standard answers.

The computational performance of the proposed model was evaluated on both an NVIDIA GTX 1080 GPU and a standard Intel CPU platform. Experimental results indicate that the model achieves a favorable Balance between predictive accuracy and processing efficiency. Specifically, the average inference latency remained below 200 milliseconds per query, rendering it suitable for real-time deployment in educational question-answering scenarios. The model architecture is inherently modular and supports parallel computation. The integration of pretrained Word2Vec embeddings and the adoption of computationally efficient similarity measures, such as Manhattan distance, contribute to reduced inference overhead.

Application oriented interactive learning system for university

This study designed an operational flowchart for an interactive learning system in universities, including six parts: interactive learning, data collection, data processing and training, real-time optimization technology, feedback collection mechanism, and continuous model optimization, as shown in Fig. 3.

The interactive learning system for universities includes mobile and web platforms, mainly serving university students and teachers. The services are divided into AI services and manual services, among which AI services include open source data warm boot 、LSTM + Attention、 Instant question and answer, belonging to the level of Human-Machine interactive. Human services include providing feedback on incorrect answers, annotating new questions for teachers, verifying responses from students, teachers, and researchers, and belong to the level of interpersonal interaction. Real time data collection begins with user question input, synchronously obtaining user profiles. In the data processing stage, text cleaning and spelling correction are performed, and dynamic injection of learning progress, knowledge graph association, and historical question and answer triple context is performed to correlate information. The data training is conducted through Quora and dataset, and compared and tested with SemEval and Task data. This model continuously updates and optimizes the interactive learning system through three paths: real-time optimization technology, feedback collection mechanism, and continuous model optimization. Real time optimization technology implements dynamic model selection (lightweight model for simple problem routing/complete model for complex problem routing), mixed precision acceleration, and high-frequency result caching. The feedback mechanism combines explicit feedback with implicit feedback, where explicit feedback includes user ratings, accuracy feedback, and manual verification; Implicit feedback includes depth of inquiry, analysis of dwell time, and identification of knowledge gaps. Continuous model optimization is based on daily incremental training: mining low scoring answers and high questioning difficulties, using adversarial samples and knowledge system updates to transfer teacher model soft labels to student models. The final output of multimodal response includes core answers (including formula rendering), dynamic knowledge graph, related exercises, and recommended videos, forming a closed-loop learning system of “data acquisition intelligent reasoning feedback optimization”.

The interactive learning system is based on human-machine collaborative evolution and constructs a structure of “intelligent processing humanistic care”. Through the dynamic context perception (learning portrait/knowledge graph) and multimodal response (formula rendering/AR demonstration) of the human-computer interaction layer, the system realizes a personalized cultivation path; The teacher-student collaboration mechanism and historical Q&A reuse in the interpersonal interaction layer reflect interpersonal interaction. In the dimension of interactive learning, closed-loop feedback systems enable machines to continuously optimize and promote a spiral of “problem-solving knowledge construction”. Ultimately, the visualization of knowledge graphs reproduces the inheritance of disciplinary context, personalized learning is reflected in learning path recommendations, and the deep analysis of problems by big language models embodies the spirit of digital humanities.

Satisfaction survey results

Basic information

In total, 377 valid questionnaires were collected through surveys. The survey institutions are higher vocational colleges and universities, and the distribution of the survey respondents’ academic years is concentrated in the freshman (29.7%), sophomore (68.4%), and junior (1.9%) years of colleges and universities, of which 213 (56.5%) are males and 164 (43.5%) are females, and the largest number of majors are in the field of big data technology and application (25.2%).

Satisfaction analysis

In this design, we updated the version of the online Q&A system. We performed a series of adjustments to the basic Siamese LSTM model. The basic concept of the control variable method was used. This study used the Likert 5-point scale to survey 377 college students who used online question and answer platforms.Table 8 shows the dimensions, and variable of the satisfaction survey. Through reliability and validity testing, the reliability of this study is above 0.85, and the KMO is above 0.83.

Table 8 Variables and variable source.

First, under the same conditions, the experimental results corresponding to the Manhattan distance method as the distance formula are always superior to those corresponding to the cosine distance method. Second, an attention mechanism is added without changing the other parts of the original model, i.e., an attention layer is added after each LSTM layer to select critical information for the current research mission from a wide range of data. The accuracy can achieve an improvement of approximately 10% in some cases. Following the previous adjustments, we attempted to use a pretrained language model to replace word-embedding methods. The pretrained language model helps sentences from question pairs maintain semantic relationships. These results are applicable to the two sample datasets in which we provide question pairs from online Q&A systems in college and Stack Overflow question pairs. Our system implements simple information extraction and a natural language understanding method (based on the Siamese LSTM model) to answer human questions.

As shown in Fig. 4, It can be seen that AC has the highest score (4.4), with “I am satisfied with the accuracy of the platform’s Q&A system” indicating that students are satisfied with the accuracy and efficiency of answering questions on the online platform. PU (3.98) has the second highest score, with the question “The updated platform Q&A system is better to use than before” indicating that students are satisfied with the use of the updated online system. HS (3.97) and PQ (3.97) have the third highest scores, with “Our school’s online learning platform allows me to maintain high standards (quality) of online learning” and “When I need help with answering questions, I tend to ask questions on the platform’s Q&A system” This indicates that schools have a high level of support for students’ online learning, and students often use online platforms to answer questions. As can be seen from the figure, the score of Other use (UAP) is the lowest among all the questions (2.55), and the original question of this question is “If I have any questions, I will search for some additional platforms to find answers”, It can be seen that this question item is a negative question, so it is reasonable for this question item to have a low score. This also indirectly indicates that students have a high satisfaction with the question and answer platform in this study and a high usage rate of the platform.

link