现代软件工程2011——作业3

Requirements:

现代软件工程2011——作业3_第1张图片

Answers:

Academic search engines are useful tools for finding and accessing articles in academic journals or conferences. So we select three popular players in this field and compare them with several measures in the following.

Brief Overview:

We have selected Microsoft Academic Search and two other players, Google Scholar and Arnetminer for comparison.

Our measures for comparison are:

(1) Quality of returned results

(including the relevence between queries and results, coverage of areas and ability of approximate matching);

(2) Update rate;

(3) User satisfaction;

(4) Extended services.

Measure 1: Quality of returned results

(1) Quality of results for papers

When we search for a particular paper (i.e. enter the full name or most part of the full name of a paper), the result shows that Microsoft Academic search just returns that particular paper without any other related results while the other two search engines- Google Scholar and Arnetminer- return lots of related results.

现代软件工程2011——作业3_第2张图片 现代软件工程2011——作业3_第3张图片 现代软件工程2011——作业3_第4张图片
Microsoft Google Arnetminer

The three pictures shown above demonstrate the results of searching the paper "SPEED: Precise and Efficient Static Estimation of Program Computational Complexity".

(2) Quality of results for an academic field

When we search for an academic field, all of the three search engines can retrun pretty good results. The example this time is "Static Analysis".

现代软件工程2011——作业3_第5张图片 现代软件工程2011——作业3_第6张图片 现代软件工程2011——作业3_第7张图片
Microsoft Google Arnetminer

(3) Approximate matching

There is no denying that the ability of approximate matching is indispensible for every search engine. Here we do some experiments on approximate matching.

a) Search for "jiawei hen" instead of "jiawei han"

现代软件工程2011——作业3_第8张图片 现代软件工程2011——作业3_第9张图片 现代软件工程2011——作业3_第10张图片
Microsoft Google Arnetminer

This experiment shows that Microsoft Academic search is lack of the power to do approximate matching, while Google Scholar and Arnetminer are fault-tolerant and will provide some alternative query at the same time.

(4) Others considerations

Here we have to point out one detail about Google scholar.

Google Scholar has paid lots of attention on abbreviated name. An appropriate example is not very far to seek. In the case we type "S. Gulwani" instead of "Sumit Gulwani". The first several results from Google Scholar contain key words "Sumit Gulwani". However, neither Microsoft academic search nor Arnetminer can reach it.

Arnterminer employes H-Index and Microsoft uses H-Index and G-Index. As google has no such function, it is hard to measure google for ranking researchers.

Measure 2: Update rate

 In comparing the time efficiency among these three search engines, we give out two cases.(The result is double checked until 2011.3.8)

(1) Conference Paper: The first case is "WSDM 2011". WSDM(Web Search and Data Mining) is the premier international ACM conference covering research in the areas of search and data mining on the Web which took place during Februray 9-12, 2011 in Hong Kong. We try "WSDM 2011" on Google Scholar, and to our surprise, we can find more than 100 papers published in WSDM 2011, while no suitable results returned by other two players.

Give one example, the accepted paper "On the Selection of Tags for Tag Cloud" can be found by Google scholar while it does not present in both Microsoft Academic or Arnetminer. The paper is also available on the website from the author's website and google can relate it to the correct acm portal page in such "short" time after the proceedings of WSDM 2011 

(2) Journal Paper: The second case is "Topology-Adaptive Mesh Deformation for Surface Evolution, Morphing, and Multiview Reconstruction", a newest paper published via the homepage of IEEE(http://www.computer.org/portal/web/tpami/). The result is similar to the first case: Google Scholar outperforms her competitors by the powerful update rate.

Measure 3: User satisfaction

 (1) Loading Time

Measured by the http://tools.pingdom.com/, It is in the order of Microsoft, Google, Arnetminer

现代软件工程2011——作业3_第11张图片 现代软件工程2011——作业3_第12张图片 现代软件工程2011——作业3_第13张图片
Microsoft Google Arnetminer

As we can see, google takes less to load the webpages while arnetminer takes more. Google has less contents and Microsoft contains much. Arnetminer may have slow connections due to network conditions.

(2) User experience

a) Advanced search(filter the result):

Google offers much advanced search options like keyword specification, Author, Publication, Year, Specific Topics, Legal Opinions and Patent Search.

Microsoft offers a smaller set of such options, and it has Author, Conference, Journal, Year.

Arnetminer offers no such advanced search

b) Download Experience

       Viewing and downloading papers is one key function user expected from such a search engine.

   Google offers the best download experience. Google provides three ways to do such things

  • Official Pages
  •           
  • Download link mined through the web (like psu.edu, citeseer.com)
  •           
  • Through nearby library

       Microsoft offers a relative good download experience, as it can handle some of the websites that is related to certain papers.

       Arnetminer offers almost no download experience

c) User subscribe:

       Subscribe experience is quite important as it can add to the user viscosity.

       Google: Google can extend its subscription with its google reader and google calendar. I have subscribed to several IEEE transactions. It can send emails with new search content.

       Microsoft: Very good subscription as an individual project, offers subscription to author, journal, and search keyword.

       Arnetminer: No subscription offered

d) Citation:

        It is common scenario that researchers use the reference papers to do literal review before conducting research

         Google:Will offers different version of a single publication and papers cited that paper

        Microsoft:Only the papers from the same conference

        Arnetminer:Offers a list of citing papers

e) Misalliance:

       Microsoft requires to install siverlight which makes viewing co-author graph as well as viewing conference calendar impossible under linux/unix and those who does not install cannot have an alternative like Flash.

   Bibtex (or other citation format) is best supported at arnetminer.

Measure 4: Extended services

Google serves as a solid and power search engine for publications. However, she fails to extend her services to a broader view as Microsoft Academic and Arnetminer plays.

Each of Microsoft Academic and Arnetminer provide name disambiguation, that is to distinguish different authors with the same name.

Both Microsoft Academic and Arnetminer will direct you to a profile page dedicated to each paper, author, organization, conference or journal. Specifically pages for each author will cover detailed academic information, such as affiliation, research interest, homepage and list of publications. Similarly, pages for a certain organization, conference or journal also provide detailed introductions which are absent on Google.

Moreover, the Visual Explorer of Microsoft Academic explores the scholars' cooperating network by the co-author graph for a certain author and the co-author path between two authors. At the same time, Arnetminer moves a step further by mining advisor-advisee relationships between two authors who ever cooperated, so users can easily find the directors or students of an author. Detailed examples are shown in the following.

现代软件工程2011——作业3_第14张图片

现代软件工程2011——作业3_第15张图片

Microsoft Academic: co-author graph for "Jiawei Han" Microsoft Academic: co-author path between "Jiawei Han" and "Jie Tang"

现代软件工程2011——作业3_第16张图片

现代软件工程2011——作业3_第17张图片

Arnetminer: social graph for "Jie Tang"

(Red lines stand for "advisor" and yellow lines stand for "advisee")

Arnetminer: social graph between "Jiawei Han" and "Jie Tang"

Summary

To sum up, each of Microsoft Academic, Google Scholar and Arnetminer plays a vital role for academic search and meets the needs of different users. For simplification, we can list the strength (marked by green) and weakness (marked by red) in the table as follows.

Measures Google Scholar Microsoft Academic Arnetminer
M1:Content

Satisfactory

Good approximate match

Good in abbreviate spell

Satisfactory

Bad approximate matching

Bad in abbreviate spell

Satisfactory

Good approximate matching

Bad in abbreviate spell

M2:Update Rate Fastest Slow Slow
M3:User Satisfaction

Fastest

 Good download experience

Good subscription experience

Citation finding is fine

Relatively Slow

satisfactory download experience

Good subscription experience

Citation finding is relatively poor

Slow

Bad download experience

No subscription experience

Citation finding is fine

M4:Extended Services No

Name disambiguation,

Co-author graph & path

H-Index, G-Index

call-for-paper calendar

Fail to mining deeper relationships

Name disambiguation,

Advisor-advisee social graph

H-Index

Recommendation to Microsoft Academic

1. Make it faster and improve its update rates (actually I believe that Microsoft will have the latest proceedings at hand asap);

2. Extend the field range, such as mathematics, biology, physic after computer science is mature.

3. Offer more convient downloading experience, as it may be the primative expectation for a large range of potential users



4. Try more extended service, such as advisor-advisee relationship mining, expert finding. Explore more practical senario for such features rather than a toy example.

5. There are SNS examples present as Mendeley which may use a different strategy but Microsoft can integrates it with its LIVE account as well as its information in cmt.research.microsoft.com.

6. It is a good idea for subscription with new papers, trend, call for papers, special issues on journal and the like, it is a good idea that microsoft academic be a source center which will largely improve the user viscosity.

7. Microsoft has a clearer structure for academic search actually though part of the functionality is not mature, its object-level-vertical search. As a vertical search product that targets at more specific audience with more specific needs from its audience, sometimes user needs grows with what we can offers them. Hence, after we can satisfy the basic needs, more sophisticated needs are also crucial to satisfy its potential users.

你可能感兴趣的:(软件工程)