From Zero to 100,000+: How We Built a Quote Library for The Literary Clock?
Two years ago ,we decided to make a literature related product--NovellaMate, that can collect and display various personalized quotes.
Over the past two years, we’ve been asked the same question countless times:
"Why do you need that many quotes? What does 100,000 or more even mean?"
To be honest, we didn’t set out to collect this many. But once we started, we quickly realized: we had to do it — not just because it was meaningful, but because it was necessary, and surprisingly hard.
Is 100,000 quotes a lot or a little?
Our original vision was simple: NovellaMate should be able to deliver the right quote at the right moment — based on time, weather, and feeling.
Sounds elegant, but from a data perspective, it’s a beast.
Let’s start with time-based quotes:
There are 1,440 minutes in a day. If we want one meaningful quote for every single minute — say, 7:42 a.m. or 11:19 p.m. — that’s 1,440 quotes minimum. And if we want to personalize quotes for different moods or moments (early morning, tea time, bedtime...), we need to multiply that by 3 or even more.
But where do you even find that many quotes? How do you verify them?
Who else has tried something like this?
We’re not the first to dream of a literary time machine. Some great pioneers lit the way:
Christian Marclay’s “The Clock” (2010)
This is a 24-hour video art piece stitched together from thousands of film clips, each one showing a specific time on-screen. It took Christian’s team three years to complete and won the Golden Lion at the Venice Biennale. We watched it in person at New York MoMA in 2024. It left us speechless.
The Guardian’s Time Quote Project (2011–2013)
They ran a global crowdsourcing project to collect time-related literary quotes. After two years, they collected just over 1,000 quotes, which were showcased at the Edinburgh International Book Festival. A beautiful idea — and one we’ve drawn a lot of inspiration from.
A Fancy Literary Clock — Author Clock(2021)
This physical clock product displays literary quotes matched to the current time. The creators collected over 13,000 quotes and produced a beautiful, successful product. We bought one. We respect and admire it deeply.
And then there are the DIY heroes...
A talented developer jailbroke a Kindle to surprise his girlfriend with a DIY quote clock.
There’s a Literature Clock website and mobile app — both labors of love.
On GitHub, a nice community contributor published an open-source “author clock” project with 2,200+ quotes.
All of these creators mentioned the same thing: finding quotes is easy, cleaning and validating them is hard.
So why did we go for 100,000+?
One word: Repetition.
We’re heavy readers ourselves. After a week of using most quote-based products by ourselves, we’d already start seeing the same quotes pop up again. That thrill of discovery will be Gone.
And we didn’t just want any quote. We wanted personalized quotes. If you love Murakami, we want to show you more Japanese modern lit. If you enjoy feminist literature, let’s surface more Woolf and Atwood. That requires volume.
Which is why we didn’t stop until we had six digits.
What went into it?
It took us two full years to build this collection. Here’s how:
Web-wide data mining
We scanned open literary databases(e.g. Project Gutenburg, Google Books), eBooks(e.g. Open Library), film scripts(e.g. IMDB), quotes from celebrities(e.g.Goodreads), archives(e.g. Internet Archive) — anything we could legally access — to match quotes to specific timestamps, weather and mood. This is the most intellectually intensive job since we have to wrote many codes to do the data mining, analysis and verification. Thank god! We have Stan, the talented engineer in our team. He did fantastic job!Community contributions
Like The Guardian, we invited literature lovers from some Hong Kong colleges and universities to contribute their favorites. Not the most efficient method, but we found hidden gems this way. They are really nice professors and university students. We love them!Manual validation
Every quote was manually checked — for context, for legality, for formatting. This is the hardest part work that nearly broke our eyes and brains. As Natto on our team put it: “I think I went partially blind proofreading quotes.”
Is 100,000 enough?
Honestly, maybe. But just barely.
A few thousand quotes? Too easy to repeat.
Tens of thousands? Still not enough to personalize effectively.
At 100,000, we’re finally starting to hit that pass line — where surprise, resonance, and variety actually feel possible.
To Be Continued…
This is just Chapter 1 of our journey. In the next post, we’ll share the joy and excitement of our work during the collection of so many quotes.
If you’re a fellow literature lover, we hope you’ll stay with us. There’s so much more we want to tell you.