mag 06

MailChimp Chief Data Scientist John Foreman likes to talk about orange juice. On the surface, it’s a strange way to start a discussion about data, but it all starts to make sense when you peel back the rind. It’s a way of thinking that’s letting MailChimp — which sends about 35 billion emails a year on behalf of roughly 3 million users — transform itself into a data-driven business 12 years into its existence.

When you’re in Atlanta, as I was during a recent trip, the obvious place to start talking about orange juice and data is with Coca-Cola. Foreman can tell you all about how the beverage giant — whose headquarters tower over the city just a just a mile away from MailChimp’s office — uses advanced algorithms and giant vats of different juices to ensure the proper flavor of its Simply Orange line of orange juice. However, it’s something else Coca-Cola is doing that inspired the way Foreman thinks about data and that’s helping MailChimp re-imagine what it means to engage with fans, readers and customer through their inboxes.

Anyone familiar with how large web companies came to pioneer the practice of what we now call “big data” should appreciate the analogy. Coca-Cola, which also owns Minute Maid, produces a lot of excess pulp when it makes orange juice. For decades, presumably, it had just been throwing that pulp away, but in 2006 it decided to make use of it by launching a new product called Minute Maid Pulpy. Sold primarily in Asian countries, Pulpy has become a billion-dollar business for Coca-Cola.

Once MailChimp is done with its primary business of sending emails, it has a lot of pulp of its own in the form of data. And rather than just ignoring it or writing up some cute blog posts (which he also does), Foreman and his bosses want to turn that data into revenue.

First things first: Making better orange juice

Neil Bainton

Neil Bainton

Actually, though, MailChimp first brought in Foreman in 2011 to help the company improve its core business of letting users build and send their emails. MailChimp’s culture was built around many things, COO Neil Bainton told me, but data wasn’t one of them. It had “various fits and starts” through the years trying to work data into its business model, and each step just added more complexity.

The challenges were technological as well as cultural, but Foreman had a plan, of which focus was a key aspect. Keeping a tight focus meant Foreman and his lone-developer sidekick could build what they needed to in a short timeframe. It also meant the company didn’t have to worry about some massive overnight transformation into a data-obsessed company like Google.

John Foreman

John Foreman

“[They] don’t need to be afraid the entire culture is gonna fall down if we bring in this weird math guy,” he joked.

Foreman’s first project — deploying artificial intelligence models that would automatically detect spammy email lists from MailChimp’s users – is actually critical to the way MailChimp operates, though. It was up and running in production within a year, after a technologically challenging effort of merging separate database instances for each customer into a single environment that would let MailChimp run complex analyses across its customer base.

It’s such an important project, Foreman explained, because internet service and email providers keep reputation scores on the IP addresses that send email through their systems. Because MailChimp serves as the email engine for its millions of users, sending too many messages that get flagged as spam and lower MailChimp’s reputation will have a negative impact on everyone. The company used to deal with spam manually, and only after recipients began complaining about the messages they received.

“It used to be before we had that AI model in place that everyone had a crappier experience,” Foreman said.

Say goodbye to those ’90s fans, Pearl Jam

Source: MailChimp

Source: MailChimp

Now, however, MailChimp knows some of the telltale signs of spam for which it should be on the lookout. If too high a percentage of email addresses on a given list are also available via publicly available lists or those you can buy on sketchy corners of the internet, it’s probably spam. Too many old and far-more-likely-to-be-dead Earthlink or Compuserve addresses, or letters within one keystroke of each other as if someone just mashed the keyboard? Probably spam.

Thankfully, though, about 98 percent of the spam that MailChimp identifies is what Foreman calls “ignorant” — that is, people or companies that just don’t know the laws or best practices around sending emails. But ignorance doesn’t mean MailChimp relaxes its rules. Recently, it even flagged Pearl Jam for spammy practices because the band was trying to reconnect with old fans whose email addresses read like a who’s who list of 1990s email providers.

Having such a high percentage of ignorant spam actually has a positive effect on the company’s overall goal of monetizing its vast data repositories. Because the AI model automates what used to be a manual process, and because most innocent spammers will fall in line quickly once they’re notified (as opposed to nefarious spammers who constantly try to outsmart the system), MailChimp can pretty much set the model loose, forget about it and get to work on new efforts, Foreman said.

Now, about that pulp

Spam under control, MailChimp can focus its efforts on actually building new products with data, just like Coca-Cola did with that extra pulp. One of its first orders of business is figuring out how to help customers get to know better the people to whom they’re sending their newsletters.

With this in mind, the company built a service called Wavelength that shows customers other newsletters that are similar to theirs. But the system that powers Wavelength also stores pretty much every interaction that every email address in the company’s database has with the newsletters they’re sent. That means what emails they open and when they open them, what links they click and when they click them, and what other newsletters they’re subscribed to. MailChimp also has a feature called Ecommerce360 that lets customers track clicks right through to conversions (marketing speak for someone actually buying something).

The company has been playing around with this data to identify clusters of users based on their behaviors and their interests — some of which Foreman has detailed on the company’s blog — and now it wants to roll it out to customers via a product MailChimp is calling ChimpQuery. Built atop Google’s BigQuery analytics service, ChimpQuery will let customers start doing this type of clustering and segmentation on their own, while saving MailChimp the troubles of hosting that infrastructure itself. (You can play with a monstrous, interactive graph of the entire MailChimp subscriber list here.)

If you sell knitting supplies and you find out there’s a big cluster of people on your mailing list who also are interested in wedding planning and custom jewelry, there might be an opportunity to create your content with these interests in mind or even to partner with companies in those spaces.

A sample cluster of subscribers.

A sample cluster of subscribers.

Another topic that has been on Foreman’s mind lately is what he calls “frequency elasticity of engagement.” He’s done research suggesting that blasting the heck out of your email list might actually have detrimental effects in the long term (regardless of how the Obama campaign successfully exploited this strategy) but noted that engagement also has a lot to do with content and a particular company’s given user list. MailChimp’s data could help customers figure out the ideal schedule for emailing their subscribers.

For example, Birchbox has really high engagement because people love the service and have to open their emails to find out what goodies they’re receiving. Emails from a company like Papa John’s, on the other hand, might sit in someone’s inbox essentially as spam until they want to order a pizza and go searching for a coupon. Everyone has to figure out what pace and engagement metrics work for them.

Reining expectations back in

However, now that management is fully sold on the power of data, Foreman sometimes finds himself managing expectations rather than just pitching his ideas. COO Bainton, for example, is adamant that MailChimp start aiding its publishing-industry customers by using techniques such as natural-language processing and semantic analysis to help them personalize emails based on readers stated and unstated interests (that is, what boxes they check when they sign up and what stuff they actually click on).

Foreman, well, he’s pretty sure that’s too big a challenge for MailChimp to tackle considering how many publishing customers it has. MailChimp would have to understand all those customers’ industries to some degree (open source tools tend to highlight technically but not situationally relevant relationships, he said, and don’t always understand things like sarcasm) and probably the different languages they publish in, as well. Rather than understand content, he’d rather focus personalization efforts around how users are connected.

The company also needs to balance its ambitions with what’s legally and socially acceptable. The creep factor might be more important than what’s legal when it comes to email marketing. MailChimp determines the legality of everything it does before rolling it out, Foreman explained, but in era of “post-modern spam” where legitimacy is in the eye of the recipient and where some people use their “spam” button as a proxy for unsubscribing, companies must be careful not to offend.

“The more we can tell you about that list without getting creepy is really useful,” Bainton said. However, he added, ”I think expectation is more important than law.”


Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.

    


Tagged with:
gen 06

Not many companies launch by accident, but that’s exactly how London startup Pusher first showed its face to the world back in 2010. Co-founder Damien Tanner had intended to invite a friend to take a look at the real-time web service he’d been working on — and then made an all-too-familiar Twitter slip.

“I meant to send a direct message, but I ended up sending a public @ message to someone,” he laughs. “Suddenly there were people who were following us both who had signed up and were saying it looked interesting.”

That early moment of calamity turned out to be a blessing, however. The initial interest was strong enough to encourage the team to keep developing and now, just a couple of years later, the business is gaining plaudits — and customers.

So what is Pusher? Co-founder and CEO Max Williams describes the service as a set of tools that make it simple to add real-time functions to other websites or services. And at heart, that’s it: an API that lets people offload some potentially tricky work.

“We allow developer to make applications real-time, so that users don’t have to refresh the page, and they automatically get information streamed to them where they are,” he explains. “But it’s a flexible enough service that it can be used for all sorts of things.”

Indeed it is, pulling in clients as diverse as MailChimp, Slideshare and The U.S. Open tennis. Gaming companies are latching on to Pusher to provide multiplayer experiences without having to build out lots of infrastructure or admin, and the business is constantly expanding the ways for people to hook into its REST API.

Building this sort of thing could be achieved in-house, of course — and many web apps currently do it for themselves — but Pusher hopes that the idea of handing off the server and administration load to somebody else will appeal because it allows developers to make and deploy real-time elements quickly and at scale and focus on their product.

“If you have 10,000 people looking at a page, waiting for some sports scores or tweets, you send one message to us and we relay it to those 10,000 people,” says Tanner. “Previously you’d have had an Ajax thing requesting new information from the server every second or every five seconds — and if you have 10,000 people doing that at once it’s actually a scaling challenge. We make it so much easier by having a websocket connection with the browser, which is just a pipe we can just push data down.”

Like so many projects, Pusher was initially developed to scratch their own itch. When the pair’s last business, web consultancy New Bamboo, started building its own products, the team discovered it was coming across one particular issue time and again.

“We ran into the same problem — synchronization across browsers,” says Williams. “Someone’s messing around changing things, but those changes aren’t reflected on somebody’s else’s browsers… so then they start to get out of sync.”

“We built what would become Pusher to solve that problem — but when we came around to implementing it, it only took a few hours. We suddenly thought that to have this problem solved in a few hours is actually a much more interesting thing than the original applications we built. So we put them to one side and started working on getting Pusher out.”

And it was good timing. Growing the real-time web is hot right now: not just through the explosion activity around Twitter over the last few years, but also through the growth in mobile apps and the spread of into businesses. And while some may feel ambivalent about the data deluge it creates, there’s no doubt that many more companies are looking at ways to speed up what they do. And if that’s the case, then providing real-time as a service could be lucrative — at least that’s the feeling of Pusher’s investors, including London-based Passion Capital and the founders of cloud app platform Heroku, who gave the company $1 million in funding last summer.

40 billion messages and counting

Six months on, Williams and Tanner are cagey about Pusher’s numbers, but say that the service now has 10,000 registered users, hundreds of active ones and some high-profile partnerships — and is running close to break-even. And with 40 billion messages delivered, they are hoping that it can become a fundamental foundation on which hundreds or thousands of new real-time businesses can be built.

“A lot of people have their existing infrastructure or their existing applications, and they’re not going to rebuild the whole thing today,” says Tanner. “So a lot of people are adding Pusher on to the side or as an element to add some real-time parts.

“But there are people exploring building fully real-time web apps, where all the communication works over WebSockets. As we move into the future, we see that being the primary method of communication between a service and the browser. You’re still going always going to send audio and video through HTML or streaming methods, but for interacting with web applications, WebSockets is the ultimate technology to use.”

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.


Tagged with:
 

Pages Menu 

Tags 

 

Archivi 

 

Categories 

Meta

preload preload preload