How Precog Is Mining Through Big Data For Ecommerce Gold, With John De Goes

Every day, there are massive amounts of social and interaction data being collected by web sites, mobile applications, and more on how consumers and users are using those web sites and applications. How does a company take that vast amount of data, and make sense of it, and not only analyze but use it to drive things like product recommendations and personalization--and ultimately, more revenues and profits? Boulder-based Precog ( is taking on that task, with its own platform to help make it much easier to create products around that vast amount of data. We talked with John De Goes, CEO of the company, to learn about big data, how the firm is parsing through that data to help e-commerce and other companies, and how the company ended up focused here.

What is Precog all about?

John De Goes: Precog is a cloud-based, data warehousing and analysis platform. It's designed to do one thing, very, very well, which is to help companies create productive data assets. That's where we fit into the whole ecosystem. Obviously, we live in an era of big data, where there are massive amounts of information captured by cell phones, web apps, server side apps. Everywhere, there are massive volumes of information being created, captured, and stored. One of the key challenges facing companies, is they have all of this pile of information stacking up in their data warehouse, but the next step is to figure out how to turn those data assets into products and features that can drive incremental revenue. Precog is a technology platform to help companies to do exactly that.

How would your typical customer use your service?

John De Goes: A typical customer is a tech company that has some kind of software application, which captures a large amount of interaction and behavior data. One of our customers is Flixmaster, an online video platform which is kind of like YouTube, but specializing in interactive, branching video. Their software is used by large companies like NBC, to create interactive videos and maximize engagement. One key challenge they have, is there is a huge amount of information they get when interacting with video, ranging from what people interacted with, what they're looking at, what they're clicking on. Capturing all that data, and doing something useful is a really, really big challenge.

They use Precog's capabilities to analyze that data to do very advanced analytics on their data set, and roll that into self service reporting. Companies like NBC can log into a dashboard, gain insight into what users are doing, what is popular, what is not popular, where they are coming from, and get insights and optimize their future and current videos on Flixmaster. That's fairly typical. There are lots of companies out there who are out there building web or server or mobile apps, and they are looking to take that data and turn it into products they can sell to other companies, or turn into particular features of their existing products.

Amazon is probably the most famous company out there which is capturing all the interactions that every single person has with their e-commerce platform. Every time you go to an Amazon page, they capture the interaction with that page. Together, with data about where you are from, and what you have ordered, they have a huge amount of information, measured in petabytes. How Amazon uses that to drive incremental revenue, is they have very sophisticated analytic capabilities, and turns that into specific features in their e-commerce platform to drive new revenue. In their case, they do two things--one is personalization, where every page is personalized for you to maximize the probably that you'll buy what you came in to buy, and other stuff besides. Number two, is the do some very, very interesting recommendations. When you buy things, you are shown other things which you might also be interested in buying. That's just one example of a data-driven company effectively using the wealth of information pouring in every second of every day, from human interactions with products, and productizing that data into features of an e-commerce platform. Those features end up having tremendous impact on the bottom line.

What's your background and how did you get into thise?

John De Goes: I was VP of Engineering for the company They were an advertising platform, and I was brought into the company to turn around the technology. What I ended up doing in my time there, was building an engineering team, and architecting next generation advertising platform, which scaled to billions of impressions. One of the cornerstone features of that platform was the ability to serve personalized ads, based on what was known about a user. We'd pull in tweets from their friends, and other things like that in order to influence people to click on ads. It had one very interesting feature, which was dynamic creative optimization. We'd experiment with ad variations and converge on those that performed the highest. My biggest challenge in running that engineering team, and buliding the ad platform, was designing and architecting and maintaining a system that could scale to billions and billions of ad data points a month. Every time someone interacted, we'd store the interaction and all the details of that interaction, including the time, page, where they were from, what their geolocation was, and other information about them, how long they interacted, what they did with the ad, and if they went on to convert. There was mind boggling amounts of information on the users, especially if those ads were running in widespread ad campaigns. Storing that was the easy part, but it ran into the petabytes. The really hard part, is to productize that data.

For our advertising platform, we needed to productize the data in multiple ways. When any advertisers came into the platform, and came to run ads, they wanted to see what effect that advertising was having, what users were doing, conversion rate, click through rate, all that stuff. So, we built a self-service reporting tool on top of this huge amount of data. Also, we wanted to productize it in other ways. One was, when you were showing an ad to a person, you'd want to determine which was the most effective ad to show them. That's actually a problem in data analysis and data mining. Finally, we wanted to be able to iteratively converge on the dynamic, creative optimization, which required components of machine learning, statistics, and all sorts of other things. Those were three different ways we wanted to productize that massive amount of data, but we found out there were no tools out there to do that. We ended up all of the open source technology like Hadoop, MongoDB, and all those other technologies that are used here from the open source ecosystem, and we still had to labor for about a year before we got something with the right features in it. It was a colossal undertaking. The reason for that, is there's an impedence mismatch. The level of abstraction of the open source big data and analytics functionality is hard to translate into features in an application. It took us a year, and in some cases two years and we were till not there on some of this, and we had a team of twenty engineers working on it. That's why, when our company was acquired by LivingSocial, I decided to leave LivingSocial and found the company that later became Precog. I wanted to solve this problem once and for all. We wanted to provide very effective, high level tools to turn data assets, however large they may be, into data products and data-driven products.

Was the TechStars experience helpful for you?

John De Goes: It was tremendously helpful. TechStars helps in a lot of ways. One is their network of connections. There's a huge network of connections you can leverage, which can get you connected to lots of different companies. That serves two purposes. One, is it give you early customers. But, it also gives you early feedback on what you're trying to do. The whole mentality at TechStars, is you shouldn't just come up with a crazy idea and build a company around it. Rather, it's that you should talk to people and learn about their pain, and figure out things you can do to build solutions that meets the needs they have. It's called customer development, and the philosophy around it, which the companies embrace whole heartedly, is that for literally one month, you don't write a line of code. You talk to people who have the problem you are trying to solve. We talked with more than a hundred companies during the course of TechStars, and learned all about their stuff. We learned that we were not alone, and weren't the only company going through that pain. In fact, we ran into dozens, hundreds, thousands who had gone through the same trials and tribulations we had at That was enormously helpful. Finally, because it's very competitive to get into TechStars --seven hundred to a thousand people apply, and they only let in 10 or so per class--that means your idea is vetted by lots and lots of people. Getting into TechStars gives you credibility in the eyes of both potential customers as well as investors. We were able to close our angel round and raise money even before TechStars had finished, and closed out the round about a month or so later, and ended up oversubscribed by $250K. TechStars made all of that much easier.

Where are you now, and what are the next steps for you company?

John De Goes: We're working really hard to get our product into public beta. We just launched public beta two, two and a half weeks ago. It's the first time in the history of our company that our technology is publicly available. We're working extraordinarily hard. We have a big team, and are now at sixteen people, of which twelve are engineers. That's a big, huge engineering team relative to our size. We're working really hard on what is a phenomenally hard problem. I'm using the brightest engineers I could get my hands on, and hand recruited every single member of the team. They are open source contributors, book authors, and language developers. They're really sharp guys, who are experts in computer science and machine learning. We're working extraordinarily hard to get our product out there, in the hands of people. What we want to do with this public beta, is get it into more people's hands, get them using our technology in production, and get lots of feedback to refine it and much sure it's scalable, is performant, and has the initial set of features that are needed for our target market. We hope to go out in production in non-beta, sometime in Q2 or Q2 of 2013.