Chat-Your-Data Challenge

Chat-Your-Data Challenge

2 min read

ChatGPT has taken the world by storm. Millions are using it. But while it’s great for general purpose knowledge, it only knows information about what it has been trained on, which is pre-2021 generally available internet data. It doesn’t know about your private data, it doesn’t know about recent sources of data.

Wouldn’t it be useful if it did? This is where LangChain comes in.

The goal of LangChain is to make it easier for everyone to develop language model applications. We recently published a guide on how to create your own ChatGPT over your data here. This included an example GitHub repo to start from and customize. But even still, there is a long tail of data sources to integrate with and write prompts for. We realized this after putting a call out to see what the most interesting integrations would be and getting an overwhelming response.

In a "Chat-Your-Data" Challenge, we're launching a week long challenge to create ChatGPT over your data sources.

Motivation

The motivation for doing this is, as always, to make it easier for everyone to develop language model applications. In particular, we believe that examples are critically important for helping people do so. Therefore, we are hoping to get as many examples (data loaders + prompts) as possible for doing this for various data sources.

We will then put the data loading logic in LangChain, put the prompts in LangChainHub, and put the examples in the LangChain documentation to make it as easy as possible for others to get started.

How to get started

  1. Clone the example GitHub repo
  2. Customize the data source + prompts to your data (can follow this tutorial)
  3. Bonus: deploy a nice frontend to go along with it! We have an example deployment to Hugging Face spaces in the above tutorial.
  4. Submit your entry with this form
  5. Repeat!

Examples

We've created two example repos off of this example GitHub repo, to show what it might look like:

Other ideas for sources that we saw from the above tweet are:

  • Obsidian
  • Gong calls
  • PDFs
  • Audio files (can use Whisper!)
  • Git repos
  • Arbitrary websites

And lots, lots more! If you're looking for ideas, just look in the replies to this tweet.

Will there be a winner?

Yes! What is a challenge without a winner?

The rules of engagement are as follows:

  • At the end of each day, we will tweet out from our Twitter a list of all example GitHub repos submitted in the submission form
  • At the end of this week (2/12) we will freeze submissions and do a tweet thread will all the GitHub repos submitted
  • Whichever repo has the most stars by 2/19 will be the winner!

What do I win?

A limited edition LangChain t-shirt.