Iterative visual relationship detection via commonsense knowledge graph

Hai Wan, Jialing Ou, Baoyi Wang, Jianfeng Du* (Corresponding Author), Jeff Z. Pan, Juan Zeng* (Corresponding Author)

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

1 Citation (Scopus)


Visual relationship detection, i.e., discovering the interaction between pairs of objects in an image, plays a significant role in image understanding. However, most of recent works only consider visual features, ignoring the implicit effect of common sense. Motivated by the iterative visual reasoning in image recognition, we propose a novel model to take the advantage of common sense in the form of the knowledge graph in visual relationship detection, named Iterative Visual Relationship Detection with Commonsense Knowledge Graph (IVRDC). Our model consists of two modules: a feature module that predicts predicates by visual features and semantic features with a bi-directional RNN; and a commonsense knowledge module that constructs a specific commonsense knowledge graph for predicate prediction. After iteratively combining prediction from both modules, IVRDC updates the memory and commonsense knowledge graph. The final predictions are made by taking the result of each iteration into account with an attention mechanism. Our experiments on the Visual Relationship Detection (VRD) dataset and the Visual Genome (VG) dataset demonstrate that our proposed model is competitive.

Original languageEnglish
Title of host publicationSemantic Technology
Subtitle of host publication9th Joint International Conference, JIST 2019, Hangzhou, China, November 25–27, 2019, Proceedings
EditorsXin Wang, Francesca Alessandra Lisi, Guohui Xiao, Elena Botoeva
Place of PublicationSwitzerland
Number of pages16
ISBN (Electronic)ISBN 978-3-030-41407-8
ISBN (Print)9783030414061
Publication statusPublished - 2020
EventJoint International Semantic Technology Conference 2019 - Hangzhou, China
Duration: 25 Nov 201927 Nov 2019
Conference number: 9

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12032 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


ConferenceJoint International Semantic Technology Conference 2019
Abbreviated titleJIST 2019
Internet address

Bibliographical note

This paper was supported by the National Natural Science Foundation of China (No. 61375056, 61876204, 61976232, and 51978675), Guangdong Province Natural Science Foundation (No. 2017A070706010 (soft science), 2018A030313086), All-China Federation of Returned Overseas Chinese Research Project (17BZQK216), Science and Technology Program of Guangzhou (No. 201804010496, 201804010435).


  • Commonsense knowledge graph
  • Visual Genome
  • Visual relationship detection


Dive into the research topics of 'Iterative visual relationship detection via commonsense knowledge graph'. Together they form a unique fingerprint.

Cite this