Browse our site
About
People
Research Areas
Projects
Publications
Books
Book chapters
Journal articles
In proceedings
M. Sc. Dissertations
Ph. D. Dissertations
Technical reports
Seminars
News
You are here:
Home
Publications
View
Publication details
Go back
Publication details
Main information
Title:
Extraction and Transformation of Data from Semi-Structured Text
Publication date:
June 2007
Citation:
Raminhos07:thesis
Abstract:
The Extraction, Transformation and Loading (ETL) problematic is becoming progressively less specific to the traditional data-warehousing domain and is being extended to the processing of textual data. The World Wide Web (WWW) appears as a major source of textual information, following a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. A novel approach to ETL is proposed, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) followed by IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert.
M. Sc. dissertation
Authors:
Ricardo Raminhos
Supervisors:
João Moura Pires
School:
Universidade Nova de Lisboa
Note:
-
Url address:
-
Export formats
Plain text:
Ricardo Raminhos, Extraction and Transformation of Data from Semi-Structured Text, João Moura Pires (superv.), Universidade Nova de Lisboa, June 2007.
HTML:
<b>Ricardo Raminhos</b>, <u>Extraction and Transformation of Data from Semi-Structured Text</u>, <a href="/people/members/view.php?code=542b14e1830dcf7566974fd36b6fccc7" class="supervisor">João Moura Pires</a> (superv.), Universidade Nova de Lisboa, June 2007.
BibTeX:
@mastersthesis {Raminhos07:thesis, author = {Ricardo Raminhos}, title = {Extraction and Transformation of Data from Semi-Structured Text}, school = {Universidade Nova de Lisboa}, note = {Jo{\~a}o Moura Pires (superv.); }, abstract = {The Extraction, Transformation and Loading (ETL) problematic is becoming progressively less specific to the traditional data-warehousing domain and is being extended to the processing of textual data. The World Wide Web (WWW) appears as a major source of textual information, following a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. A novel approach to ETL is proposed, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) followed by IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert.}, keywords = {ETL, Declarative Approach, Text Processing}, month = {June}, year = {2007}, }
Publication's urls
Full url:
/publications/view.php?code=ccafedc8cd2832e3b55f811fb1518f35
Friendly url:
/publications/view.php?code=Raminhos07:thesis
Go back
Departamento de Informática, FCT/UNL
Quinta da Torre 2829-516 CAPARICA - Portugal
Tel. (+351) 21 294 8536 FAX (+351) 21 294 8541