Natural Language Processing to Extract Meaningful Information from a Corpus of Written Knowledge in Breast Cancer: Transforming Books into Data

G. Catanuto; Nicola  Rocco; Konstantina Balafa; Yazan Masannat; Andreas Karakatsanis; Anna Maglia; Peter Barry; Francesco Pappalardo; Maurizio B. Nava; Francesco Caruso

doi:10.1159/000530448

Natural Language Processing to Extract Meaningful Information from a Corpus of Written Knowledge in Breast Cancer: Transforming Books into Data

G. Catanuto, Nicola Rocco^* (Corresponding Author), Konstantina Balafa, Yazan Masannat, Andreas Karakatsanis, Anna Maglia, Peter Barry, Francesco Pappalardo, Maurizio B. Nava, Francesco Caruso

^*Corresponding author for this work

Applied Medicine

Research output: Contribution to journal › Article › peer-review

Abstract

Introduction: Books and papers are the most relevant source of theoretical knowledge for medical education. New technologies of artificial intelligence can be designed to assist in selected educational tasks, such as reading a corpus made up of multiple documents and extracting relevant information in a quantitative way.
Methods: Thirty experts were selected transparently using an online public call on the website of the sponsor organization and on its social media. Six books edited or co-edited by members of this panel containing a general knowledge of breast cancer or specific surgical knowledge have been acquired. This collection was used by a team of computer scientists to train an artificial neural network based on a technique called Word2Vec.
Results: The corpus of six books contained about 2.2 billion words for 300d vectors. A few tests were performed. We evaluated cosine similarity between different words.
Discussion: This work represents an initial attempt to derive formal information from textual corpus. It can be used to perform an augmented reading of the corpus of knowledge available in books and papers as part of a discipline. This can generate new hypothesis and provide an actual estimate of their association within the expert opinions. Word embedding can also be a good tool when used in accruing narrative information from clinical notes, reports, etc., and produce prediction about outcomes. More work is expected in this promising field to generate “real-world evidence.”

Original language	English
Pages (from-to)	209-212
Number of pages	4
Journal	Breast Care
Volume	18
Issue number	3
Early online date	10 May 2023
DOIs	https://doi.org/10.1159/000530448
Publication status	Published - Jun 2023

Keywords

breast cancer
artificial intelligence
medical education
Medical education
Breast cancer
Artificial intelligence

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1159/000530448Licence: Unspecified

Cite this

@article{704f44169148408abb7858b202c34bab,

title = "Natural Language Processing to Extract Meaningful Information from a Corpus of Written Knowledge in Breast Cancer: Transforming Books into Data ",

abstract = "Introduction: Books and papers are the most relevant source of theoretical knowledge for medical education. New technologies of artificial intelligence can be designed to assist in selected educational tasks, such as reading a corpus made up of multiple documents and extracting relevant information in a quantitative way. Methods: Thirty experts were selected transparently using an online public call on the website of the sponsor organization and on its social media. Six books edited or co-edited by members of this panel containing a general knowledge of breast cancer or specific surgical knowledge have been acquired. This collection was used by a team of computer scientists to train an artificial neural network based on a technique called Word2Vec. Results: The corpus of six books contained about 2.2 billion words for 300d vectors. A few tests were performed. We evaluated cosine similarity between different words. Discussion: This work represents an initial attempt to derive formal information from textual corpus. It can be used to perform an augmented reading of the corpus of knowledge available in books and papers as part of a discipline. This can generate new hypothesis and provide an actual estimate of their association within the expert opinions. Word embedding can also be a good tool when used in accruing narrative information from clinical notes, reports, etc., and produce prediction about outcomes. More work is expected in this promising field to generate “real-world evidence.”",

keywords = "breast cancer, artificial intelligence, medical education, Medical education, Breast cancer, Artificial intelligence",

author = "G. Catanuto and Nicola Rocco and Konstantina Balafa and Yazan Masannat and Andreas Karakatsanis and Anna Maglia and Peter Barry and Francesco Pappalardo and Nava, {Maurizio B.} and Francesco Caruso",

year = "2023",

month = jun,

doi = "10.1159/000530448",

language = "English",

volume = "18",

pages = "209--212",

journal = "Breast Care",

issn = "1661-3791",

publisher = "Karger",

number = "3",

}

TY - JOUR

T1 - Natural Language Processing to Extract Meaningful Information from a Corpus of Written Knowledge in Breast Cancer

T2 - Transforming Books into Data

AU - Catanuto, G.

AU - Rocco, Nicola

AU - Balafa, Konstantina

AU - Masannat, Yazan

AU - Karakatsanis, Andreas

AU - Maglia, Anna

AU - Barry, Peter

AU - Pappalardo, Francesco

AU - Nava, Maurizio B.

AU - Caruso, Francesco

PY - 2023/6

Y1 - 2023/6

N2 - Introduction: Books and papers are the most relevant source of theoretical knowledge for medical education. New technologies of artificial intelligence can be designed to assist in selected educational tasks, such as reading a corpus made up of multiple documents and extracting relevant information in a quantitative way. Methods: Thirty experts were selected transparently using an online public call on the website of the sponsor organization and on its social media. Six books edited or co-edited by members of this panel containing a general knowledge of breast cancer or specific surgical knowledge have been acquired. This collection was used by a team of computer scientists to train an artificial neural network based on a technique called Word2Vec. Results: The corpus of six books contained about 2.2 billion words for 300d vectors. A few tests were performed. We evaluated cosine similarity between different words. Discussion: This work represents an initial attempt to derive formal information from textual corpus. It can be used to perform an augmented reading of the corpus of knowledge available in books and papers as part of a discipline. This can generate new hypothesis and provide an actual estimate of their association within the expert opinions. Word embedding can also be a good tool when used in accruing narrative information from clinical notes, reports, etc., and produce prediction about outcomes. More work is expected in this promising field to generate “real-world evidence.”

AB - Introduction: Books and papers are the most relevant source of theoretical knowledge for medical education. New technologies of artificial intelligence can be designed to assist in selected educational tasks, such as reading a corpus made up of multiple documents and extracting relevant information in a quantitative way. Methods: Thirty experts were selected transparently using an online public call on the website of the sponsor organization and on its social media. Six books edited or co-edited by members of this panel containing a general knowledge of breast cancer or specific surgical knowledge have been acquired. This collection was used by a team of computer scientists to train an artificial neural network based on a technique called Word2Vec. Results: The corpus of six books contained about 2.2 billion words for 300d vectors. A few tests were performed. We evaluated cosine similarity between different words. Discussion: This work represents an initial attempt to derive formal information from textual corpus. It can be used to perform an augmented reading of the corpus of knowledge available in books and papers as part of a discipline. This can generate new hypothesis and provide an actual estimate of their association within the expert opinions. Word embedding can also be a good tool when used in accruing narrative information from clinical notes, reports, etc., and produce prediction about outcomes. More work is expected in this promising field to generate “real-world evidence.”

KW - breast cancer

KW - artificial intelligence

KW - medical education

KW - Medical education

KW - Breast cancer

KW - Artificial intelligence

UR - http://www.scopus.com/inward/record.url?scp=85164430241&partnerID=8YFLogxK

U2 - 10.1159/000530448

DO - 10.1159/000530448

M3 - Article

SN - 1661-3791

VL - 18

SP - 209

EP - 212

JO - Breast Care

JF - Breast Care

IS - 3

ER -

Natural Language Processing to Extract Meaningful Information from a Corpus of Written Knowledge in Breast Cancer: Transforming Books into Data

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this