Benchmarks for Understanding Indian Legal Documents for NyAI

What is BUILD?

BUILD is a sequential sentence classification dataset which provides structure to Indian Court judgements using sentence rhetorical roles. Automatic Structuring of Court judgements is foundation building block for creating other applications like summarization, automatic charge identification etc. This is created as part of OpenNyAI mission by EkStep Foundation, Thoughtworks , Agami , National Law School's law and technology society (Bangalore) and Rohini Nilekani Philanthropies.

For more details about BUILD, please refer to our paper:

For details about data download, preprocessing, baseline model training, and evaluation please refer to GitHub repository. To try rhetorical rolewise summarization on custom judgement text using the baseline model, please refer to Colab Notebook.

Getting started

BUILD is distributed under a CC BY-SA 4.0 License. The training and development sets can be downloaded below.

Once you have built your model, you can use the evaluation script we provide below to evaluate model performance by running python <path_to_prediction> <path_to_gold>

To submit your models and evaluate them on the official test sets, please read our submission guide hosted on Codalab.

Have Questions?

Ask us questions at our slack channel

Rank Model Code Weighted-F1

Bert-base HSLN (Baseline model)
Ekstep, Thoughtworks