AscentKB - Download

Download

Recommended usage

The AscentKB v1.0.0 that contains 8.9M commonsense assertions is available in 🤗Datasets with two configurations: canonical and open, which consist of canonicalized assertions (see below for explanations) and open assertions, respectively.

This KB is ~19 times larger than ConceptNet (note that in this comparison, non-commonsense knowledge in ConceptNet such as lexical relations is excluded).

More information available on the HuggingFace Hub.

Example usage:

Relation canonicalization

In AscentKB v1.0.0, besides the open assertions (as in the Browse functionality), we also mapped each assertion into a canonical relation. Those relations are mostly based on the set of ConceptNet 5 relations with slight modifications:

Introducing 2 new relations: /r/HasSubgroup, /r/HasAspect (derived directly from Ascent data model).
All /r/HasA relations were replaced with /r/HasAspect. This is motivated by the ATOMIC-2020 schema, although they grouped all /r/HasA and /r/HasProperty into /r/HasProperty.
The /r/UsedFor relation was replaced with /r/ObjectUse which is broader (could be either "used for", "used in", or "used as", ect.). This is also taken from ATOMIC-2020.

Ascent KB v1.0.0

Download

Alternatively, you can drirectly download the dataset here:
ascent-v1.0.0.json.gz (678MB) - a JSON file of 8.9M commonsense assertions.

Data fields (for ascent-v1.0.0.json.gz only)

arg1: the first argument to the relationship, e.g., elephant
rel: the canonical relation, e.g., /r/HasProperty
arg2: the second argument to the relationship, e.g., intelligence
support: the number of occurrences of the assertion, e.g., 15
facets: an array of semantic facets, each contains

value: facet value, e.g., extremely
type: facet type, e.g., DEGREE
support: the number of occurrences of the facet, e.g., 11

source_sentences: an array of source sentences from which the assertion was extracted, each contains

text: the raw text of the sentence
source: the URL to its parent document

subject: the original subject of the assertion (before canolicalization), e.g., elephant
predicate: the original predicate of the assertion (before canolicalization), e.g., be
object: the original object of the assertion (before canolicalization), e.g., intelligent

License

CC BY 4.0

Evaluation data

We provide the data used in our evaluation so that you can reproduce the experiment:

150 subjects for intrinsic evaluation: download,
50 questions for extrinsic evaluation: download.

Code

Codebase for the Ascent extraction pipeline can be found in this Github repository.