- Who is more likely to gain
a large number of citations
- Predicting the future influential researchers in big scholarly network
2 ()
1) 0
Who is more likely to gain a large number of citations 2 () - - PowerPoint PPT Presentation
Who is more likely to gain a large number of citations 2 () 1) 0 Predicting the future
a large number of citations
2 ()
1) 0
Introduction
MENU
Dataset & Preprocessing Results and Conclusion Predicting
Introduction
Introduction
Whether to be accepted or identified often depends
factors that indicate the of a scientific work is its frequency
In this project, we firstly introduce the threshold model to get some ideas of the information diffusion in real condition and then we introduce some regression models to fit the features and citation to implement the predicting work
Dataset & Preprocessing
Dataset
Citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and
with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc
Citation Network Dataset
Extraction
Preprocess
Iterating the citation with the same author or publication venue to convert the feature into numeric valuesIteration
Use word2vec in Gensim to extract the feature inside the titles and abstracts.Parsing #* --- paperTitle #@ --- Authors #year ---- Year #conf --- publication venue #citation --- citation number #! --- Abstract …
Parsing
We plot some diagrams to get clearer ideas about the preprocessed features.
Analysis Results
Predicting
Sales
Threshold Model
Generally, earlier publication will have less influence in the future. Each node v has an information acceptance threshold vand is affected by all
be activated when
Linear Regression & NLR
using the regress() to obtain the weight array for multiple variables and then get the predicting results NLR models add some features by multiplying others.SVM
Use fitrsvm to fit the model and predict the result. Related errors are calculated as well.Different Models
Regression Tree
Use fitrtree to fit the model and predict the result. Related errors are calculated as well.· Data preprocessing Firstly we import the data (”output.csv”), extracting the citation column as the result for training. · Split Then we split the data set into training set and testing set. · Fitting We choose different models to fit the feature and citation and obtain the optimal weight vector. · Post-processing Accumulate the citation of the same author to represent an author’s future impact. · Compare Calculate the errors to compare the performance.
Common Steps
Specific Procedure
Specific Procedure
Results & Conclusion
Comparison of Performance
We quantify the impact as the citation times of a
different models, we find that the Non-linear regression and SVM models obtain the respectively better predicted results among the four and we are expecting to implement more complicated and accurate algorithms in the future to deeply study the future impact prediction
Conclusion