Who is more likely to gain a large number of citations 2 () - - PowerPoint PPT Presentation

who is more likely to gain a large number of citations 2
SMART_READER_LITE
LIVE PREVIEW

Who is more likely to gain a large number of citations 2 () - - PowerPoint PPT Presentation

Who is more likely to gain a large number of citations 2 () 1) 0 Predicting the future


slide-1
SLIDE 1
  • Who is more likely to gain

a large number of citations

  • Predicting the future influential researchers in big scholarly network

2 ()

1) 0

slide-2
SLIDE 2

Introduction

1 4 3 2

MENU

Dataset & Preprocessing Results and Conclusion Predicting

slide-3
SLIDE 3

1

Introduction

slide-4
SLIDE 4

Introduction

Whether to be accepted or identified often depends

  • n the influence of a paper(or work). One of the essential

factors that indicate the of a scientific work is its frequency

  • f citation.

In this project, we firstly introduce the threshold model to get some ideas of the information diffusion in real condition and then we introduce some regression models to fit the features and citation to implement the predicting work

A

slide-5
SLIDE 5

2

Dataset & Preprocessing

slide-6
SLIDE 6

Dataset

Citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and

  • ther sources. It can be used for clustering

with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc

Citation Network Dataset

slide-7
SLIDE 7 Extract the text features from the original text.

Extraction

Preprocess

Iterating the citation with the same author or publication venue to convert the feature into numeric values

Iteration

Use word2vec in Gensim to extract the feature inside the titles and abstracts.

Parsing #* --- paperTitle #@ --- Authors #year ---- Year #conf --- publication venue #citation --- citation number #! --- Abstract …

slide-8
SLIDE 8

Parsing

slide-9
SLIDE 9

We plot some diagrams to get clearer ideas about the preprocessed features.

Analysis Results

slide-10
SLIDE 10

3

Predicting

slide-11
SLIDE 11

Sales

Threshold Model

Generally, earlier publication will have less influence in the future. Each node v has an information acceptance threshold vand is affected by all

  • f its active neighbor nodes A(v). Node v will

be activated when

slide-12
SLIDE 12

Linear Regression & NLR

using the regress() to obtain the weight array for multiple variables and then get the predicting results NLR models add some features by multiplying others.

SVM

Use fitrsvm to fit the model and predict the result. Related errors are calculated as well.

Different Models

Regression Tree

Use fitrtree to fit the model and predict the result. Related errors are calculated as well.
slide-13
SLIDE 13

· Data preprocessing Firstly we import the data (”output.csv”), extracting the citation column as the result for training. · Split Then we split the data set into training set and testing set. · Fitting We choose different models to fit the feature and citation and obtain the optimal weight vector. · Post-processing Accumulate the citation of the same author to represent an author’s future impact. · Compare Calculate the errors to compare the performance.

Common Steps

slide-14
SLIDE 14

Specific Procedure

slide-15
SLIDE 15

Specific Procedure

slide-16
SLIDE 16

4

Results & Conclusion

slide-17
SLIDE 17

Comparison of Performance

slide-18
SLIDE 18

We quantify the impact as the citation times of a

  • researcher. By comparing the performance of the

different models, we find that the Non-linear regression and SVM models obtain the respectively better predicted results among the four and we are expecting to implement more complicated and accurate algorithms in the future to deeply study the future impact prediction

  • f a researcher.

Conclusion

slide-19
SLIDE 19
  • 2018.05