Basic Statistics

  • Total images:
  • Average image size (px):
  • Maximum image size (px):
  • Minimum image size (px):

  • Total region descriptions: 4,297,502
  • Total image object instances: 1,366,673
  • Unique image objects: 75,729

  • Total object-object relationship instances: 1,531,448
  • Unique relationships: 40,480
  • Total attribute-object instances: 1,670,182
  • Unique attributes: 40,513

  • Total Scene Graphs: 108,249
  • Total Region Graphs: 3,788,715

  • Total Question Answers: 1,773,258

Common Statistics

Average number of objects Average number of relationships Average number of attributes
Per region annotation
Per image
Most common objects Most common predicates Most common attributes

        Region Descriptions

        • Average word length in description:
        • Average region width:
        • Average region height:

        Region Description Word Length

        Region Width

        Region Height

        Region Area

        Region per Image Distribution


        Top Region Phrases

        Top Region Words

        Sentence Objects per Image

        Sentence Objects per Region

        Sentence Objects per Bounding Box

        Top Object Names

        Objects and Categories

        VisualGenome ImageNet Detection (training/validation) MS COCO Caltech 101 Caltech 256 Caltech Pedestrian PASCAL Detection (training/validation) Zitnick Abstract Scenes
        Images 476,688 328,000 9,144 30,608 250,000 11,530 10,020
        Objects 534,309 2,500,000 9,144 30,608 350,000 (2,300 unique pedestrians) 27,450 58
        Categories 200 91 102 257 1 20 11
        Objects per category 2671.5 27472.5 90 119 350,000 1372.5 5.27
        Some categories:

          Objects and Actions

          VisualGenome Ikizler Actions Willow (Delaitre) Gupta Sports Actions Stanford 40 Actions PPMI PASCAL 89 Actions (Le) TUHOI MS COCO-a
          Images 467 968 300 9532 4800 1221 2038 10805 ~10000
          Distinct objects 0 5 4 31 7 6 19 189 81
          Distinct actions 6 7 6 40 7 10 89 2974 140

          Attribute Datasets

          VisualGenome aPascal and aYahoo Animals with Attributes (AWA)* PubFig* SUN Attribute* Caltech-UCSD Birds-200-2011* ImageNet ILSVRC 2013**
          Distinct Attributes 1,280 4,250 ? 102 312 17
          Attributes per image ? 85 73 102 312 16.55 (average objects per image * attributes per object)
          * This dataset uses binary attributes.
          ** This dataset has 6 attribute "categories"; humans chose how much of each attribute an object had. See here for more details.

          Attributes per Image

          Attributes per Region

          Attributes per Sentence Object

          Attributes per Category:

          Top Attributes

          Top Attributes on People*

          * Attributes describing all instances of "people", e.g. "man," "women," "person."

          Relationships per Image

          Relationships per Region

          Relationships per Sentence Object

          Top Relationships

          Top Person-Like Relationships*

          * Relationships where both subject and object are instances of "people", e.g. "man," "women," "person."
          We ran an experiment to test how diverse our descriptions were. We clustered all of our region descriptions (read the paper for more details) into semantic clusters. Some examples of clusters are shown in the infographic at the bottom. We found that on average each image had descriptions from 17 different clusters. We conducted a similar experimenting with Microsoft COCO's sentences and found that our images had descriptions from more clusters.

          Clusters per Image*

          * Clusters found through Minibatch K-means on vector representations of each RegionAnnotation phrase. These vectors were formed by averaging the Word2Vec vectors of each word in the phrase.

          Clusters per Image – COCO Comparison*

          * Since COCO has 5 captions per image, we randomly sample 5 region annotations per image for a fairer comparison.

          Cluster Visualization

          Top Image Synsets

          Top Object Synsets

          Top Attribute Synsets

          Top Relationship Synsets

          Top Region Synsets

          Top Question Synsets

          Top Answer Synsets

          Question Answering Statistics


          • Total QA pairs: 1,773,258
          • Total QA images: 101,174
          • Average question length (words): 6.0 ± 1.9
          • Average answer length (words): 1.9 ± 1.3

          Type What Where When Who Why How Which
          Number 874,318 50,883 245,072 78,978 38,725 157,343 188,068
          Percentage 49.3% 2.9% 13.8% 4.5% 2.2% 8.9% 10.6%

          © Stanford University    Sponsors    Creative Commons   Stanford University