Back to first pageBack to first page Centre for Artificial Intelligence of UNL
Browse our site
You are here:

Publication details

Publication details
Main information
Extraction and Transformation of Data from Semi-Structured Text
June 2007
The Extraction, Transformation and Loading (ETL) problematic is becoming progressively less specific to the traditional data-warehousing domain and is being extended to the processing of textual data. The World Wide Web (WWW) appears as a major source of textual information, following a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. A novel approach to ETL is proposed, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) followed by IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert.
M. Sc. dissertation
Ricardo Raminhos
João Moura Pires
Universidade Nova de Lisboa
Export formats
Ricardo Raminhos, Extraction and Transformation of Data from Semi-Structured Text, João Moura Pires (superv.), Universidade Nova de Lisboa, June 2007.
<b>Ricardo Raminhos</b>, <u>Extraction and Transformation of Data from Semi-Structured Text</u>, <a href="/people/members/view.php?code=542b14e1830dcf7566974fd36b6fccc7" class="supervisor">João Moura Pires</a> (superv.), Universidade Nova de Lisboa, June 2007.
@mastersthesis {Raminhos07:thesis, author = {Ricardo Raminhos}, title = {Extraction and Transformation of Data from Semi-Structured Text}, school = {Universidade Nova de Lisboa}, note = {Jo{\~a}o Moura Pires (superv.); }, abstract = {The Extraction, Transformation and Loading (ETL) problematic is becoming progressively less specific to the traditional data-warehousing domain and is being extended to the processing of textual data. The World Wide Web (WWW) appears as a major source of textual information, following a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. A novel approach to ETL is proposed, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) followed by IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert.}, keywords = {ETL, Declarative Approach, Text Processing}, month = {June}, year = {2007}, }
Publication's urls

Centre for Artificial Intelligence of UNL
Departamento de Informática, FCT/UNL
Quinta da Torre 2829-516 CAPARICA - Portugal
Tel. (+351) 21 294 8536 FAX (+351) 21 294 8541
