Back to first pageBack to first page Centre for Artificial Intelligence of UNL
Browse our site
You are here:

Publication details

Publication details
Main information
Extraction and Transformation of Data from Semi-Structured Text
June 2007
Raminhos07:thesis
The Extraction, Transformation and Loading (ETL) problematic is becoming progressively less specific to the traditional data-warehousing domain and is being extended to the processing of textual data. The World Wide Web (WWW) appears as a major source of textual information, following a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. A novel approach to ETL is proposed, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) followed by IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert.
M. Sc. dissertation
Ricardo Raminhos
João Moura Pires
Universidade Nova de Lisboa
-
-
Export formats
Ricardo Raminhos, Extraction and Transformation of Data from Semi-Structured Text, João Moura Pires (superv.), Universidade Nova de Lisboa, June 2007.
<b>Ricardo Raminhos</b>, <u>Extraction and Transformation of Data from Semi-Structured Text</u>, <a href="/people/members/view.php?code=542b14e1830dcf7566974fd36b6fccc7" class="supervisor">João Moura Pires</a> (superv.), Universidade Nova de Lisboa, June 2007.
@mastersthesis {Raminhos07:thesis, author = {Ricardo Raminhos}, title = {Extraction and Transformation of Data from Semi-Structured Text}, school = {Universidade Nova de Lisboa}, note = {Jo{\~a}o Moura Pires (superv.); }, abstract = {The Extraction, Transformation and Loading (ETL) problematic is becoming progressively less specific to the traditional data-warehousing domain and is being extended to the processing of textual data. The World Wide Web (WWW) appears as a major source of textual information, following a human-readable semi-structured format, referring to multiple domains, some of them highly complex. Traditional ETL approaches following the development of specific source code for each data source and based on multiple domain / computer-science experts interactions, become an inadequate solution, time consuming and prone to error. A novel approach to ETL is proposed, based on its decomposition in two phases: ETD (Extraction, Transformation and Data Delivery) followed by IL (Integration and Loading). The ETD proposal is supported by a declarative language for expressing ETD statements and a graphical application for interacting with the domain expert.}, keywords = {ETL, Declarative Approach, Text Processing}, month = {June}, year = {2007}, }
Publication's urls
/publications/view.php?code=ccafedc8cd2832e3b55f811fb1518f35
/publications/view.php?code=Raminhos07:thesis

Centre for Artificial Intelligence of UNL
Departamento de Informática, FCT/UNL
Quinta da Torre 2829-516 CAPARICA - Portugal
Tel. (+351) 21 294 8536 FAX (+351) 21 294 8541

Fundacao_FCT