Where academic tradition
meets the exciting future

On Estimating the Scale of National Deep Web

Denis Shestakov, Tapio Salakoski, On Estimating the Scale of National Deep Web. In: Proceedings of DEXA'07, LNCS 4653, 780-789, Springer, 2007.

Abstract:

With the advances in web technologies, more and more information on the Web is contained in dynamically-generated web pages. Among several types of web 'dynamism' the most important one is the case when web pages are generated as results of queries submitted via search web forms to databases available online. These pages constitute the portion of the Web known as deep Web. The existing estimates of the deep Web are predominantly based on study of English deep web sites. The key parameters of other-than-English segments of the deep Web were not investigated so far. Thus, currently known characteristics of the deep Web may be biased, especially owing to a steady increase in non-English web content. In this paper, we survey the part of the deep Web consisting of dynamic pages in one particular national domain. The estimation of the national deep Web is performed using the proposed sampling techniques. We report our observations and findings based on the experiments conducted in summer 2005.

BibTeX entry:

@INPROCEEDINGS{inpShSa07a,
  title = {On Estimating the Scale of National Deep Web},
  booktitle = {Proceedings of DEXA'07},
  author = {Shestakov, Denis and Salakoski, Tapio},
  volume = {LNCS 4653},
  publisher = {Springer},
  pages = {780-789},
  year = {2007},
  keywords = {deep web, national web, web characterization},
}

Belongs to TUCS Research Unit(s): Data Mining and Knowledge Management Laboratory

Edit publication