• ← Back to Blog
  • 1. Generative AI Foundations
    • 1.1. LLM Survey
  • 2. LLM Research
    • 2.1. LLM Pretraining & Fine-tuning
    • 2.2. LLM Agents
    • 2.3. LLM Optimization
    • 2.4. LLM Prompting
    • 2.5. LLM Benchmarks & Evaluations
    • 2.6. LLM Multi-Modal
    • 2.7. LLM Models
    • 2.8. LLM Security & Safety
    • 2.9. LLM Architecture
  • 3. LLM Implementation
    • 3.1. Datasets
    • 3.2. Large-Language-Models
🔎

    Datasets

    Datasets - Instruction Tuning🔗

    Datasets List

    Figure: Natural language instructions databases

    1 https://github.com/allenai/unifiedqa

    2 https://github.com/LAION-AI/Open-Instruction-Generalist

    3 https://github.com/hkunlp/unifiedskg

    4 https://github.com/allenai/natural-instructions-v1

    5 https://github.com/allenai/natural-instructions

    6 https://huggingface.co/datasets/bigscience/P3

    7 https://github.com/bigscience-workshop/xmtf

    8 https://github.com/google-research/FLAN

    9 https://github.com/BAAI-Zlab/COIG

    10 https://github.com/orhonovich/unnatural-instructions

    11 https://github.com/yizhongw/self-instruct

    12 https://github.com/XueFuzhao/InstructionWild

    13 https://github.com/nlpxucan/evol-instruct

    14 https://github.com/tatsu-lab/stanford_alpaca

    15 https://github.com/csitfun/LogiCoT

    16 https://huggingface.co/datasets/databricks/databricks-dolly-15k

    17 https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM

    18 https://huggingface.co/datasets/GAIR/lima

    19 https://huggingface.co/datasets/JosephusCheung/GuanacoDataset

    20 https://github.com/LAION-AI/Open-Assistant

    21 https://github.com/project-baize/baize-chatbot

    22 https://github.com/thunlp/UltraChat#data

    Task Instructions

    Task Example

    Datasets - Benchmarks🔗

    GSM8K - https://huggingface.co/datasets/gsm8k/viewer/main/train

    <
    >