Basic Statistics
- Total images:
- Average image size (px):
- Maximum image size (px):
- Minimum image size (px):
- Total region descriptions: 4,297,502
- Total image object instances: 1,366,673
- Unique image objects: 75,729
- Total object-object relationship instances: 1,531,448
- Unique relationships: 40,480
- Total attribute-object instances: 1,670,182
- Unique attributes: 40,513
- Total Scene Graphs: 108,249
- Total Region Graphs: 3,788,715
- Total Question Answers: 1,773,258
Common Statistics
|
Average number of objects |
Average number of relationships |
Average number of attributes |
Per region annotation |
|
|
|
Per image |
|
|
|
Most common objects |
Most common predicates |
Most common attributes |
|
|
|
Region Descriptions
- Average word length in description:
- Average region width:
- Average region height:
Region Description Word Length
Region Width
Region Height
Region Area
Region per Image Distribution
Top Region Phrases
Top Region Words
Sentence Objects per Image
Sentence Objects per Region
Sentence Objects per Bounding Box
Top Object Names
Objects and Categories
|
VisualGenome |
ImageNet Detection (training/validation) |
MS COCO |
Caltech 101 |
Caltech 256 |
Caltech Pedestrian |
PASCAL Detection (training/validation) |
Zitnick Abstract Scenes |
Images |
|
476,688 |
328,000 |
9,144 |
30,608 |
250,000 |
11,530 |
10,020 |
Objects |
|
534,309 |
2,500,000 |
9,144 |
30,608 |
350,000 (2,300 unique pedestrians) |
27,450 |
58 |
Categories |
|
200 |
91 |
102 |
257 |
1 |
20 |
11 |
Objects per category |
|
2671.5 |
27472.5 |
90 |
119 |
350,000 |
1372.5 |
5.27 |
Some categories:
Objects and Actions
* This dataset uses binary attributes.
** This dataset has 6 attribute "categories"; humans chose how much of each attribute an object had. See
here for more details.
Attributes per Image
Attributes per Region
Attributes per Sentence Object
Attributes per Category:
Top Attributes
Top Attributes on People*
* Attributes describing all instances of "people", e.g. "man," "women," "person."
Relationships per Image
Relationships per Region
Relationships per Sentence Object
Top Relationships
Top Person-Like Relationships*
* Relationships where both subject and object are instances of "people", e.g. "man," "women," "person."
We ran an experiment to test how diverse our descriptions were. We clustered all of our region descriptions (read the paper for more details) into semantic clusters.
Some examples of clusters are shown in the infographic at the bottom. We found that on average each image had descriptions from 17 different clusters.
We conducted a similar experimenting with Microsoft COCO's sentences and found that our images had descriptions from more clusters.
Clusters per Image*
* Clusters found through Minibatch K-means on vector representations of each RegionAnnotation phrase. These vectors were formed by averaging the Word2Vec vectors of each word in the phrase.
Clusters per Image – COCO Comparison*
* Since COCO has 5 captions per image, we randomly sample 5 region annotations per image for a fairer comparison.
Cluster Visualization
Top Image Synsets
Top Object Synsets
Top Attribute Synsets
Top Relationship Synsets
Top Region Synsets
Top Question Synsets
Top Answer Synsets
Question Answering Statistics
- Total QA pairs: 1,773,258
- Total QA images: 101,174
- Average question length (words): 6.0 ± 1.9
- Average answer length (words): 1.9 ± 1.3
Type |
What |
Where |
When |
Who |
Why |
How |
Which |
Number |
874,318 |
50,883 |
245,072 |
78,978 |
38,725 |
157,343 |
188,068 |
Percentage |
49.3% |
2.9% |
13.8% |
4.5% |
2.2% |
8.9% |
10.6% |