Agora’s journey to building its Big Data department started three years ago when management decided to focus on innovation and monetise data.
Speaking at the Big Data for Media Week Conference in London on Thursday, Luiza Pawela shared her journey as the Big Data director of Agora, building her company’s Big Data department from scratch.
Agora is the parent company of many prominent media outlets in Poland, such as news site Gazeta Wyborzca, radio station Złote Przeboje, television channel Stopklatka TV. Agora is also listed on the Warsaw Stock Exchange.
The company’s Big Data department has rapidly advanced from having a non-existent data team and no infrastructure to having six developers, two analysts, and two Hadoop cluster admins. Pawela’s team reports directly to the company’s management board.
Build vs. buy
In the beginning, Pawela was faced with the age-old question of whether Agora should build its own data system or buy from a third-party. Agora decided to build. Being one of the largest media groups in Poland with very diverse product offerings, Agora needed a customised data system.
“Requirements were so specific and varied that we couldn’t find suitable external tools,” Pawela said.
Staff did some calculation and predicted substantial savings from building an in-house system. But more importantly, they decided to focus on building skills and knowledge in-house: “Knowledge is one of the most important assets for the company,” Pawela said.
Agora’s decision paradigms included data integration, data enrichment, limitation of data leakage, content structuring, data liberation, novel data centric products and services.
Big Data department’s roles
The Big Data department has been an increasingly integral part of the organisation. Its main functions now include:
- Data capture: The team developed an advanced mechanism for tracking users’ behaviours, giving a wide scope of information about users’ interactions while maintaining lightness for users’ browsers. Agora captures on average one billion pageviews, four billion additional events, and 50 million unique users monthly from all its Web sites.
- User profiling: User profiling was one of the first systems developed as it has the ability to generate revenue. Apart from that, the data on user profiles are used for marketing communication, content personalisation, and analytical purposes.
- Content personalisation: Agora uses various algorithms to personalise content for its audience, using parameters such as refresh frequency, content metadata, popularity metrics, and pageviews. One of the main use cases is content placement on its sites and to support the content marketing campaign.
- Content structuring: The Big Data department has also developed a system for automatic tagging, a project that requires very specific skills. Agora has hired scientists with PhD qualifications in natural language processing to do this. Optimising the system requires cooperation with the domain experts, editors, and SEO specialists.
Agora’s Big Data journey has been successful but challenging one. Pawela highlights the company’s three success factors:
- Competent team, including programming, dev-ops and data science skills.
- Close cooperation between engineers and analysts.
At the same time, staff has faced several challenges along the way:
- Resistance of an organisation at the beginning.
- Data sparsity.
- Struggling with the technology 80% of the time and only doing data science 20% of the time.
Pawela and her team are currently working on a new project on multi-dimensional dashboards for content.