Crawling Microblog by Common-Designed Software

Indonesian Journal of Electrical Engineering and Computer Science

Crawling Microblog by Common-Designed Software

Abstract

A mount of data of microblogs is needed to be crawled for research, business analyzing, and so on. However, a lot of dynamic Web techniques are used in microblog Web pages. That makes it hard to crawl data by parsing the contents of Web pages for traditional Web page crawlers. Fortunately, microblogs provide APIs. Well-structured data can be returned to users simply by accessing those APIs in form of URLs. Basing on that mechanism, researchers have obtained some data from microblogs to research. Nevertheless, no common software for crawling microblog has been published up to now. Everyone has to start designing a microblog crawler from very beginning. A common software architecture based on microblog APIs for microblog crawler is proposed in this paper, which is named as MBCrawler. Its structure, architecture, and key classes are introduced. It can be seen that MBCrawler is modular and scalable. By implementing a real microblog crawler for Sina Weibo, it is shown that MBCrawler can fit specific features of different microblogs. DOI: http://dx.doi.org/10.11591/telkomnika.v11i7.2805

Discover Our Library

Embark on a journey through our expansive collection of articles and let curiosity lead your path to innovation.

Explore Now
Library 3D Ilustration