SAPIENTIA AEDIFICAVIT SIBI DOMVM
VENITE COMEDITE PANEM MEVM
ET BIBITE VINVM QVOD MISCVI VOBIS


I. Introduction

HistSci.us is the output of the academic research The Knowledge Network in History of Science, developed by Rafael Lamardo as Visiting Research Scholar at the Center for Science, Technology, Medicine and Society (CSTMS) at the University of California at Berkeley, in the United States.

The Knowledge Network in History of Science was developed to contribute with new approaches over the scientific knowledge production; exploring the resources of data science, and creating an epistemological interconnected environment powered by semantic web and data visualization.

The first release of HistSci.us addresses the period from the early modern History throughtout the 20th Century and addresses the following majors: History of Physics, History of Chemistry and History of Mathematics. These majors had been selected for the research due their interdisciplinary interactions - a very critical issue when organizing structured knowledge bases.

The concept of 'Knowledge Network' here adopted comprehends the use of data collection, applications and interface resources that together allow the exploration of knowledge. It doesn't overlap existing current technologies or methods, but creates additional possibilities for a new era driven by data and analysis.

This research design explores the universe of structured metadata to create a knowledge network in the History of Science. It intends to contribute to History of Science in the meaning of creating a visual framework tool for visualization of relations between theories, streams, and paradigms, which can be connected throughout a space-time multi-sided approach.


II. Methodology and Activities


This research explored the intersectional field between Data Science and Epistemology, that I refer as “Tech-Epistemology”, applied to the History of Science. It seeks to bring the "Big Piture" and to support the understanding of the temporality of science and ideas.

It was intends to contribute with the exploration of the boundaries of knowledge studies – including knowledge organization and knowledge representation – with the use of concepts of Information Science, Data Science, among other related fields. In fact, the use of a semantic data modeling and data visualization for knowledge organization had driven most of the efforts along the research.

The epistemological understanding about the knowledge – somehow restricted under the Philosophy domains along the centuries – have been recently transformed by the pacing of the fast advances of the new digital thinking on the 21st Century.

Today, the leverage of Data Science is common sense; so, it can be considered as a natural process the incorporation of a data-driven approach in other fields of study - including History of Science

Some commonly adopted methodologies and approaches in Data Science have been considered for research development, including methods and techniques applied to data analysis.

Even though the research was not based on a specific methodologies - due its exploratory and experimental essence - the fundamentals of these methodologies has been incorporated into the output framework published in this research, that includes schemas, data models, datasets and visualization models.

An important reference is the book Strengthening Data Science Methods for Department of Defense Personnel and Readiness Missions, published in 2017 by the National Academies Press.

Chapter 4 'Overview of Data Science Methods' is dedicated to the main issues regarding the data handling; since the data preparation throughout the analytical process. Chapter 4 is a very recommended reading for those who wants to understand the building blocks of the data thinking that drives the new world.

Furthermore, the chapter also highlights one of the most important issues addressed on this research project that is the complexity behind transforming data into knowledge, mentioning a report from National Research Council in 2013, as well described into page 53:

“The increase in the volume of data does not in and of itself lead to better outcomes. There are challenges associated with storing, indexing, linking, and querying large databases, but perhaps the most significant challenge is drawing meaningful inferences and decisions from analysis of the data. As discussed in the 2013 National Research Council report Frontiers in Massive Data Analysis, “Inference is the problem of turning data into knowledge, where knowledge often is expressed in terms of entities that are not present in the data per se but are present in models that one uses to interpret the data” (NRC, 2013, p. 3).”
National Academies Press, 2017

Other methods and frameworks, as the Cross-Industry Standard Process for Data Mining (CRISP-DM) and Team Data Science Process (TDSP) – usually applied to machine learning – have been also considered and merged into the final model.


Based on the described analysis, the research activities can be clustered or on the following group of activities:

1

Academic Understanding

The academic understanding of the project was defined by the research goal as described:
'development of a Knowledge Network in History of Science, oriented to create an interactive blueprint about the scientific knowledge production from the early modern period to recent history; exploring advances in data science, and creating an epistemological interconnected environment powered by semantic web and data visualization resources'.

2

Data Understanding

The project was initialized based on the analysis of History of Science publications where it was made the assessment of the main metadata structures, grouping, classification and indexation methods.

That analysis is important to create the macro level of data model. It brings the initial inputs for the classification and knowledge organization, even though effectively it happens on the data modeling.

3

Data Modeling

Data Modeling can be considered the most important part when structuring a knowledge network. During the modeling process, not only all the use requirements must be considered as well it also assigns the building blocks of the knowledge organization.

The effective exploration of metadata integrated with the use of complementary datasets determines the relevance and epistemological effectiveness of the knowledge network.

An effective data model allows the exploration of new forms of classification, clustering and visualization.

4

Data Preparation

After the initial activities, the research was focused on the collection of the datasets that would be used to populate the final data structure. Some of the activities include data cleaning, data transformation, and data abstraction.

Datasets had been organized on separated text files to be later imported to the database. The compilation process took longer than expected once part of the datasets was gathered from printed and online versions and had been manually compiled at that stage of research.

5

Data Validation

During the validation, it was fixed some issues about data cleaning and normalization.  The data models have been also revised on the validation.

6

Data Deployment

The data deployment had been made on the setup of the project database (currently in use) and import of the datasets from text files to MySQL format.

7

Evaluation/ PoC

During the Evaluation/ Proof-of-Concept (PoC) the data and models have been validated. The activities are relevant for fine tuning of all aspects related to the data, as well internally was important for allowing the first visualization of the research after months working on conceptual models and data preparation.

8

Data Visualization

Integrated with data modeling, data visualization sharps the relevance of a knowledge network. The usability of a body knowledge is defined by how agents (humans or not humans) interact with and effective is the process of consumption of knowledge, considering its applied context.

The data visualization were oriented to create an informative, flexible, non-complex and intuitive use of historical data; and also contributing with the new data-driven comprehension of the world.

9

Project Deployment

The deployment is the last project activity and consists on the research publishing under a public URL, transferring the data to the definitive infrastructure, publishing documentation and project conclusion.

III. MOTIVATION

This research was initially motivated by the perception that an underexplored epistemological gap could be unleashed by the application of the conceptual basis from Data Science and Linked Knowledge, exploring the hidden dimensions of knowledge organization. 

Along my academic career, I had been instigated by the complex mental abstraction demanded for the space-time cognition into historical analysis. Where the cognitive conversion from usually textual information to a space-time comprehension could represent a barrier for an effective learning process in History studies.

Nevertheless, the possibility to contribute with the developement of a better epistemological experience for students in the future also enpowered the research.

Mobirise

IV. About CSTMS


As a laboratory for the 21st century university, the Center for Science, Technology, Medicine and Society (CSTMS) conducts cross-disciplinary research, teaching, and outreach on the histories and implications of scientific research, biomedicine, and new technologies.

CSTMS played a key role in the research development regarding its progressive and interdisciplinary approach to the History of Science, through the connection of different backgrounds and approaches over science studies; as well by its foregrounding researches addressing the implications of data and new technology in the society nowadays.

http://cstms.berkeley.edu/

Center for Science, Technology, Medicine & Society
University of California, Berkeley
543 Stephens Hall, #2350
Berkeley, CA - USA - 94720-2350
+1 (510) 642-4581


V. ABOUT ME

I am Rafael Lamardo, Professor and Researcher with interest in Science, Technology and Society. Currently, I am affiliated as Visiting Research Scholar to CSTMS at University of California at Berkeley .

I define myself as a Tech-Epistemologist. My recent research and studies address the disciplines of AI, Semantic Web, Knowledge Networks, Knowledge Representation and Linked Knowledge.

I am graduated in History of Science from the Pontifical Catholic University of São Paulo (Brazil) with background in Communications and Information Technology.

Keep in touch
Website: professorlamardo.com
Twitter: twitter.com/Prof_Lamardo 
LinkedIn: linkedin.com/in/lamardo

Rafael Lamardo

Science, Technology and Society


SAPIENTIA AEDIFICAVIT SIBI DOMVM

VENITE COMEDITE PANEM MEVM

ET BIBITE VINVM QVOD MISCVI VOBIS

This wonderful quote means

"Wisdom has built a home for itself.
Come, eat my bread,
and drink the wine which I have prepared for you."

It is a tribute to UC Berkeley, the most inspiring place I have been in my life. The quote can be seen in the North Reading Room, at the Doe Memorial Libray.

VI. Research Collaboration & Acknowledgments


My acknowledgment for the following people who have actively contributed to the research development:

Special thanks to Professor Massimo Mazzotti, Director of the Center for Science, Technology, Medicine & Society (CSTMS), Office for History of Science and Technology (OHST), UC Berkeley, United States, who have supported the research development.

Special thanks to the History of Science community from the Center Simão Mathias for the History of Science (CESIMA/PUC-SP) - Graduate Department of History of Science at the Pontifical Catholic University of São Paulo, in Brazil. Professora Ana Maria Alfonso-Goldfarb, Director of CESIMA and who makes great contributions for the development of History of Science in Brazil and Latin America; Bruno Mattos, PhD Candidate, who had made valuable contributions to the research addressing the Digital Humanities issues; Professor José Luiz Goldfarb; Odécio Souza, who dedicated a valuable effort on the CESIMA's knowledge base in History of Science.

A special mention is dedicated to Dayvid Lima, Computer Scientist (Brazil) who had effectively coded it and supported me along the research with technical issues.

And last but not least, I also thank to Professor Gabriel Vouga Chueke, in Brazil, who had made many contributions for my career; my lifelong friend Rodrigo Vargas; Christian Rauh; and Javier Fiaschi.

This research is dedicated to my Mom and Dad, Luiz and Celia, who had permanently supported and inspired me.