WARNING: Website under maintenance!!!
Download
Recommended usage

The AscentKB v1.0.0 that contains 8.9M commonsense assertions is available in 🤗Datasets with two configurations: canonical and open, which consist of canonicalized assertions (see below for explanations) and open assertions, respectively.

This KB is ~19 times larger than ConceptNet (note that in this comparison, non-commonsense knowledge in ConceptNet such as lexical relations is excluded).

More information available on the HuggingFace Hub.

Example usage:

Relation canonicalization
In AscentKB v1.0.0, besides the open assertions (as in the Browse functionality), we also mapped each assertion into a canonical relation. Those relations are mostly based on the set of ConceptNet 5 relations with slight modifications:
  • Introducing 2 new relations: /r/HasSubgroup, /r/HasAspect (derived directly from Ascent data model).
  • All /r/HasA relations were replaced with /r/HasAspect. This is motivated by the ATOMIC-2020 schema, although they grouped all /r/HasA and /r/HasProperty into /r/HasProperty.
  • The /r/UsedFor relation was replaced with /r/ObjectUse which is broader (could be either "used for", "used in", or "used as", ect.). This is also taken from ATOMIC-2020.

Ascent KB v1.0.0
Download

Alternatively, you can drirectly download the dataset here:
ascent-v1.0.0.json.gz (678MB) - a JSON file of 8.9M commonsense assertions.

Data fields (for ascent-v1.0.0.json.gz only)
  • arg1: the first argument to the relationship, e.g., elephant
  • rel: the canonical relation, e.g., /r/HasProperty
  • arg2: the second argument to the relationship, e.g., intelligence
  • support: the number of occurrences of the assertion, e.g., 15
  • facets: an array of semantic facets, each contains
    • value: facet value, e.g., extremely
    • type: facet type, e.g., DEGREE
    • support: the number of occurrences of the facet, e.g., 11
  • source_sentences: an array of source sentences from which the assertion was extracted, each contains
    • text: the raw text of the sentence
    • source: the URL to its parent document
  • subject: the original subject of the assertion (before canolicalization), e.g., elephant
  • predicate: the original predicate of the assertion (before canolicalization), e.g., be
  • object: the original object of the assertion (before canolicalization), e.g., intelligent
License
CC BY 4.0
Evaluation data

We provide the data used in our evaluation so that you can reproduce the experiment:

  • 150 subjects for intrinsic evaluation: download,
  • 50 questions for extrinsic evaluation: download.
Code
Codebase for the Ascent extraction pipeline can be found in this Github repository.