WARNING: Website under maintenance!!!
Recommended usage

The AscentKB v1.0.0 that contains 8.9M commonsense assertions is available in 🤗Datasets with two configurations: canonical and open, which consist of canonicalized assertions (see below for explanations) and open assertions, respectively.

This KB is ~19 times larger than ConceptNet (note that in this comparison, non-commonsense knowledge in ConceptNet such as lexical relations is excluded).

More information available on the HuggingFace Hub.

Example usage:

Relation canonicalization
In AscentKB v1.0.0, besides the open assertions (as in the Browse functionality), we also mapped each assertion into a canonical relation. Those relations are mostly based on the set of ConceptNet 5 relations with slight modifications:
  • Introducing 2 new relations: /r/HasSubgroup, /r/HasAspect (derived directly from Ascent data model).
  • All /r/HasA relations were replaced with /r/HasAspect. This is motivated by the ATOMIC-2020 schema, although they grouped all /r/HasA and /r/HasProperty into /r/HasProperty.
  • The /r/UsedFor relation was replaced with /r/ObjectUse which is broader (could be either "used for", "used in", or "used as", ect.). This is also taken from ATOMIC-2020.

Ascent KB v1.0.0

Alternatively, you can drirectly download the dataset here:
ascent-v1.0.0.json.gz (678MB) - a JSON file of 8.9M commonsense assertions.

Data fields (for ascent-v1.0.0.json.gz only)
  • arg1: the first argument to the relationship, e.g., elephant
  • rel: the canonical relation, e.g., /r/HasProperty
  • arg2: the second argument to the relationship, e.g., intelligence
  • support: the number of occurrences of the assertion, e.g., 15
  • facets: an array of semantic facets, each contains
    • value: facet value, e.g., extremely
    • type: facet type, e.g., DEGREE
    • support: the number of occurrences of the facet, e.g., 11
  • source_sentences: an array of source sentences from which the assertion was extracted, each contains
    • text: the raw text of the sentence
    • source: the URL to its parent document
  • subject: the original subject of the assertion (before canolicalization), e.g., elephant
  • predicate: the original predicate of the assertion (before canolicalization), e.g., be
  • object: the original object of the assertion (before canolicalization), e.g., intelligent
CC BY 4.0
Evaluation data

We provide the data used in our evaluation so that you can reproduce the experiment:

  • 150 subjects for intrinsic evaluation: download,
  • 50 questions for extrinsic evaluation: download.
Codebase for the Ascent extraction pipeline can be found in this Github repository.