Browse our site
About
People
Research Areas
Projects
Publications
Books
Book chapters
Journal articles
In proceedings
M. Sc. Dissertations
Ph. D. Dissertations
Technical reports
Seminars
News
You are here:
Home
Publications
View
Publication details
Go back
Publication details
Main information
Title:
Declarative Approach to Data Extraction of Web pages
Publication date:
July 2009
Citation:
RA09-MScThesis
Abstract:
this thesis proposes a new more modern extractor, capable of supporting the Web evolution, as well as be generic so it can be used in any situation, and capable of being extended and easily adaptable to a more particular use. This project started by extending an earlier one capable of extractions on semi-structured text files, however it evolved to a modular extraction system capable of extracting data from webpages, semi-structured text files and be expanded to support other data source types. It also contains a more complete and generic validation system and a new data delivery system capable of performing the earlier deliveries as well as new generic ones. A graphical editor was also developed to support the extraction system features and to allow a domain expert without computer knowledge to create extractions with only a few simple and intuitive interactions on the rendered webpage.
M. Sc. dissertation
Authors:
Ricardo Freitas Alves
Supervisors:
João Moura Pires
School:
DI - FCT / UNL
Note:
-
Url address:
-
Export formats
Plain text:
Ricardo Freitas Alves, Declarative Approach to Data Extraction of Web pages, João Moura Pires (superv.), DI - FCT / UNL, July 2009.
HTML:
<b>Ricardo Freitas Alves</b>, <u>Declarative Approach to Data Extraction of Web pages</u>, <a href="/people/members/view.php?code=542b14e1830dcf7566974fd36b6fccc7" class="supervisor">João Moura Pires</a> (superv.), DI - FCT / UNL, July 2009.
BibTeX:
@mastersthesis {RA09-MScThesis, author = {Ricardo Freitas Alves}, title = {Declarative Approach to Data Extraction of Web pages}, school = {DI - FCT / UNL}, note = {Jo{\~a}o Moura Pires (superv.); }, abstract = {this thesis proposes a new more modern extractor, capable of supporting the Web evolution, as well as be generic so it can be used in any situation, and capable of being extended and easily adaptable to a more particular use. This project started by extending an earlier one capable of extractions on semi-structured text files, however it evolved to a modular extraction system capable of extracting data from webpages, semi-structured text files and be expanded to support other data source types. It also contains a more complete and generic validation system and a new data delivery system capable of performing the earlier deliveries as well as new generic ones. A graphical editor was also developed to support the extraction system features and to allow a domain expert without computer knowledge to create extractions with only a few simple and intuitive interactions on the rendered webpage.}, month = {July}, year = {2009}, }
Publication's urls
Full url:
/publications/view.php?code=6268ff1dce15102ef0986824f69f6db0
Friendly url:
/publications/view.php?code=RA09-MScThesis
Go back
Departamento de Informática, FCT/UNL
Quinta da Torre 2829-516 CAPARICA - Portugal
Tel. (+351) 21 294 8536 FAX (+351) 21 294 8541