Phrase Clustering Dataset (PCD)

amazon.science json natural language processing

Description

This dataset is part of the paper "McPhraSy: Multi-Context Phrase Similarity and Clustering" by DN Cohen et al (2022). The purpose of PCD is to evaluate the quality of semantic-based clustering of noun phrases. The phrases were collected from the [Amazon Review Dataset] (https://nijianmo.github.io/amazon/).

Update Frequency

Not updated

License

This data is available for anyone to use under the terms of the CDLA-permissive license, which is available here

Documentation

https://amazon-phrase-clustering.s3.amazonaws.com/readme.md

Managed By

See all datasets managed by Amazon.

Contact

Post any questions to re:Post and use the AWS Open Data tag.

How to Cite

Phrase Clustering Dataset (PCD) was accessed on DATE from https://registry.opendata.aws/pcd.

Usage Examples

Publications

Resources on AWS

  • Description
    Phsrase Clustering Dataset (PCD)
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::amazon-phrase-clustering
    AWS Region
    us-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://amazon-phrase-clustering/
    Explore
    phrase-clustering-dataset.json

Edit this dataset entry on GitHub

Tell us about your project

Home