What are captcha fields for

You've probably seen "captcha" fields when you've been filling in forms on websites. You get to the end of the form and just before you click the submit button, you've got to type the squiggly letters and numbers into a box.

What's this for?

The technical explanation is that it's to prevent "automated submissions". But what does that mean?

Depending on the type of form that is being filled in, there are different types of things that the owner might want to prevent.

For some reason, there are malicious people that write software that automatical fills in forms on the internet with junk and submit it. The website owner then receives all of these junk submissions as spam and has to sift through them for the real submissions.

On the other hand, the form may be for a really useful service that is really popular. Somebody may write some software that automatically submits data through the form, for example, to sign up for multiple accounts so that they can receive the benefit multiple times.

This can make it really difficult for the website owner to be able to cope with the popularity of his own service.

The website owners want to ensure that only real people can submit data through the forms on their website, not malicious spammers submitting garbage, and not people trying to automate signups. And that's what these captcha fields are designed to do.

Humans can easily understand squiggly writting, our brains are really good at that. But computers find it really difficult to understand it. It's not impossible to write software to understand squiggly writing, but it is so difficult and would require such an expensive and powerful computer, that it really wouldn't be worth the trouble doing it.

So it's not a perfect solution, but in practice it works.

There's another thing that they can do too.

Have you come across those captcha fields where you have to type TWO words into the field?

These captcha fields are provided by the ReCaptcha organisation which realised that human's ability to translate squiggly writing into plain text is a very useful ability indeed.

There are lots of old books which went out of print before computers were commonplace. These books don't exist in a digital format, and there are literally thousands of these books that need to be converted to a digital format, so that they can either be reprinted or just distributed digitally.

So, they scan the books and use a computer to try to understand the text in the books. Sometimes though, the text is too difficult for the software to understand, maybe it's smudged, the page was ripped, or the printer didn't print it very clearly. Humans can read it very clearly, but a computer finds it too difficult. But it would be too expensive to pay somebody to sit down and type them all up.

So they have latched onto the back of the captcha system. They show two squiggly words on the form. One of those words is the normal captcha test to make sure that you're a human. The other word is a randomly selected word from one of these books that the computer couldn't understand. You then provide the "translation" for free. You don't know which word is the test and which is the one that requires translation, so you have to get both of them right.

If you pass the test, the other word which you have now translated goes back to the ReCaptcha organisation to be slotted back into the digitised book in the correct place.

It's an ingenius system. It's preventing abuse on the internet and it's helping a non-profit organisation to digitise thousands of books, newspapers and magazines to preserve them for future generations.

You can have a look at examples of text that has been digitised by a computer compared to text that has been digitised using the ReCaptcha system. The difference is astounding.