建立新闻内容的数据库

tech2023-07-09  99

建立新闻内容的数据库

Photo: justgrimes

照片:justgrimes

Strictly speaking, data journalism is about telling stories with numbers.

严格来说,数据新闻业是用数字讲故事。

But let’s use a wider definition in this article, as for me the data used to tell a story can take many faces, including mere facts (historical, biographical, geographical) and different multimedia supports (interviews, photo reports, video footages). The idea here is to give you a broad view of what data journalism is and how you can use its lens to get great content ideas.

但是,让我们在本文中使用一个更宽泛的定义,因为对我而言,用于讲述一个故事的数据可能有很多面子,其中包括事实(历史,传记,地理)和不同的多媒体支持(访谈,照片报告,录像)。 这里的想法是让您对什么是数据新闻以及如何使用它的镜头来获得出色的内容创意有一个广泛的了解。

识别数据 (Identifying Data)

The first step is to identify the data you can use. But before that, you will have to define your needs.

第一步是确定您可以使用的数据。 但是在此之前,您必须定义自己的需求。

What topic do you want to cover? Are you following a seasonal trend (fashion week, movie premiere, holidays), or do you want to cover your field with content that does not change through time, aka cold content (dog breeds, recipes)?

您要涵盖什么主题? 您是否遵循季节性趋势(时装周,电影首映,假期),还是要使用不会随时间变化的内容(又称冷含量)(狗的品种,食谱)来覆盖自己的领域?

Whereas cold content data is easy to find online and can make great long tail content, warmer content is tougher to obtain. You have to be aware of any news event in your field, and to be ready to act as fast as possible with many competitors doing the same thing at the same time.

虽然冷含量数据很容易在网上找到,并且可以使长尾含量更高,但较难获得较暖的含量。 您必须了解您所在领域的任何新闻事件,并准备与许多竞争对手同时做同一件事的速度尽可能快。

Once you have identified the kind of data you are looking for, you need to put your hand on it.

一旦确定了所需的数据类型,就需要动手做。

Open data is a great way to get free data that has been gathered by governmental organisations. Data Catalogs is a good starting point for your search. This site provides open data sources for a number of countries.

开放数据是获取由政府组织收集的免费数据的好方法。 数据目录是您搜索的良好起点。 该站点为许多国家/地区提供开放数据源。

Reshaping New York: From buildings to bike lanes to painting over Broadway, how the city changed in 12 years under Mayor Michael R. Bloomberg.

重塑纽约:从建筑物到自行车道再到百老汇上的绘画,在市长迈克尔·彭博(Michael R. Bloomberg)的领导下,这座城市在12年内的变化。

For example, let’s say you are interested in Queensland oral health services. You can get a monthly extract of statistics about waiting list per clinic. Likewise, you can get crime statistics for incidents in Washington, DC.

例如,假设您对昆士兰州的口腔保健服务感兴趣。 您可以获取每个诊所每月候诊名单的统计摘要 。 同样,您可以获取华盛顿特区事件的犯罪统计信息 。

These open data catalogs are a good source when you are looking for specific data. But you can also browse them to get new ideas of content.

当您查找特定数据时,这些开放数据目录是一个很好的来源。 但是,您也可以浏览它们以获取新的内容提示。

When surfing through open data sites, it is important to keep an open mind: you can dig some completely unrelated data and find a new edge about your content.

浏览开放数据站点时,请保持开放态度,这很重要:您可以挖掘一些完全不相关的数据并找到有关内容的新优势。

For instance, flipping through the US Social Security dataset can inspire you some ideas, like babies named from movies or TV shows, from celebrities, for specific countries or periods of time. You can correlate this data with movies premieres or sport events in the news to add some depth to it.

例如,浏览“ 美国社会保障”数据集可以激发您一些启发,例如针对特定国家或时段的电影或电视节目中的婴儿,名人中的婴儿。 您可以将此数据与新闻中的电影首映或体育赛事相关联,以增加深度。

The Google Books n-gram viewer is a great tool to visualize words appearing in books for a specific period of time and a given language.

Google图书n语法查看器是一种很好的工具,可以可视化特定时间段内和给定语言中出现在书中的单词。

For example, the following graph shows the presence of four names of Lord of the Rings characters in the literature – namely Arwen, Eowyn, Galadriel and Isildur – with a first peak in 1979 and the second one in 2004 and 2005.

例如,下图显示了文献中存在四个指环王角色的名称,即Arwen,Eowyn,Galadriel和Isildur,其第一个高峰出现在1979年,第二个高峰出现在2004年和2005年。

*Source

*资源

If you are looking for warm content, two sources can help you identify what is trending in a specific country or language right now or at a specific time: the Wikipedia article traffic statistics and Google Trends.

如果您正在寻找温暖的内容,则有两个来源可以帮助您确定当前或特定时间在特定国家或地区的语言趋势: Wikipedia文章流量统计信息和Google趋势 。

Both sites can also help you identify patterns in the searched terms.

这两个站点还可以帮助您识别搜索词中的模式。

For example, dog breed is mostly searched for on Sundays in the US, while the matching Wikipedia page in English is mostly viewed on Tuesdays. Well, I can’t make much out of it, but if your content is about dogs, maybe you can. At least, you know when to publish your content and when to promote it on social media to get the maximum impact.

例如,在美国, 狗品种通常是在星期天进行搜索的,而英语版本的Wikipedia匹配页面通常在星期二进行搜索。 好吧,我不能从中受益匪浅,但是如果您的内容是关于狗的,也许可以。 至少,您知道何时发布您的内容以及何时在社交媒体上进行推广,以获取最大的影响。

Google Trends goes a bit further as it links the reasons for peaks to specific news and can give results down to a city, which is a good way to get insights from local audience.

Google趋势更进一步,它将高峰的原因与特定新闻联系起来,并将结果传递给城市,这是从本地受众那里获得见解的好方法。

For instance, fashion week peaked between February 9 and 15 in the US (for the NYC fashion week), and between April 6 and 12 in Australia (for the Sydney fashion week).

例如, 时装周在美国2月9日至15日(纽约时装周)达到高峰,在澳大利亚4月6日至12月12日(悉尼时装周)达到高峰。

So depending of your audience, you know when to prepare yourself for warm content a bit in advance. A good rule to keep in mind is that local audience is generally more interested in local news first.

因此,根据听众的不同,您会提前知道何时准备好接受温暖的内容。 要记住的一个好规则是,本地观众通常首先对本地新闻更感兴趣。

存储数据 (Storifying Data)

Who, what, when and where are in data journalism the main questions to answer, such as in any other kind of journalism. The why is not so easy to convey, as it needs a real analysis that is hard to express in visual elements. Why needs more space and more words to be explained thoroughly.

数据新闻中的主要问题是谁,什么,何时何地,例如任何其他类型的新闻。 为什么不那么容易传达,因为它需要一个很难用视觉元素表达的真实分析。 为什么需要更多的空间和更多的单词来进行全面解释。

To transform pure data into a good story, you need to find an edge, which means to ask yourself what specific question you want to answer.

要将纯数据转换成好故事,您需要找到优势,这意味着要问自己想回答什么具体问题。

All 9,866,539 buildings in theNetherlands, shaded according to year of construction.

荷兰的所有9,866,539栋建筑均根据建造年份进行了着色。

Finding a narrative can also bring life to the facts and data you gathered.

寻找叙述也可以使您所收集的事实和数据栩栩如生。

One good starting point is to identify what piece of data does not fit well with the rest: the outliers. In statistics, an outlier is a point distant from other observations. In the Washington, DC, crime statistics dataset, it could be an area more prone to specific crime type, or a type of crime that rarely happens during the day. In short, anything different from the data pattern can give you an edge.

一个很好的起点是确定哪些数据与其他数据(异常值)不太适合。 在统计中,离群点与其他观测值相距很远。 在华盛顿特区的犯罪统计数据集中,该区域可能是更倾向于特定犯罪类型的区域,或者是白天很少发生的犯罪类型。 简而言之,任何与数据模式不同的东西都可以给您带来优势。

But if outliers can make a good story, they can as well be the sign of an error in the data, so don’t forget to double-check them.

但是,如果离群值可以说明问题,那么它们也可能是数据错误的征兆,因此请不要忘记仔细检查它们。

Most of the time, the data gathered can raise different questions. If it is useful to propose different ways to interact with the same data for your users, for instance by providing both a map view and a chart view, it is also a good opportunity to create different contents from the same dataset.

大多数时候,收集的数据可能会引发不同的问题。 如果建议为用户提供不同的方式来与相同数据进行交互的方法(例如通过提供地图视图和图表视图)很有用,那么这也是从同一数据集中创建不同内容的好机会。

An overview story can bring a global awareness about your subject. This broad view can be illustrated through views closer to your audience (both topically and geographically), resulting in much more content in the end.

概述故事可以带给您关于主题的全球意识。 可以通过更接近您的受众(局部和地理位置)的视图来说明这种广泛的观点,最终产生更多的内容。

As an example, let’s try some brainstorming on the fashion weeks topic. It can be addressed through different points of view:

例如,让我们尝试一些关于时装周主题的集思广益。 可以通过不同的观点来解决:

map of the next fashion weeks with faceted search (Haute Couture/ready-to-wear, fashion houses, men/women collections)

搜寻多个时装周的地图(高级时装/成衣,时装屋,男女装) comparison of colors/patterns/materials between different years of a local fashion week

当地时装周不同年份之间颜色/图案/材料的比较 comparison of colors/patterns/materials between different local fashion weeks for the same season

同一季节不同本地时装周之间颜色/图案/材料的比较 biography timelines of creators/fashion houses, with representative works

创作者/时装屋的传记时间表,以及具有代表性的作品

Testimonials are also a great way to bring life to your content. In Boston Marathon Moment 4:09:43, the goal was to go beyond the catastrophic event and to provide insight from regular people who were there when it took place. With such an overwhelming event, the number of casualties kept the audience away from the people behind the numbers.

推荐书也是将生活带入内容的好方法。 在波士顿马拉松时刻4:09:43中 ,目标是超越灾难性事件,并向发生事件的在那里的普通人提供见识。 由于发生了这样的压倒性事件,伤亡人数使观众远离人数落后的人群。

By putting faces and voices before numbers, the readers were able to get a feeling of belonging to the story, of being part of it. If this example is particularly strong, you should keep in mind that data can be perceived as very dry, and testimonials can bring emotion back to them.

通过将面Kong和声音放在数字之前,读者可以感受到属于故事的感觉,成为故事的一部分。 如果这个例子特别有说服力,您应该记住,数据可能会被认为非常干燥,而推荐会给他们带来情感。

可视化数据 (Visualizing Data)

Sometimes, the simplest visualization is the best one to convey your point. As a matter of fact, you should restrain yourself to put visualization and interaction before your content, as it still have to tell a story.

有时,最简单的可视化是传达观点的最佳方法。 实际上,您应该限制自己将可视化和交互置于内容之前,因为它仍然必须讲述一个故事。

After this customary precaution foreword, let’s see the different types of visualizations grouped by the kind of data you want to show, each of them expressing a special meaning.

在这个习惯性的预防措施序言之后,让我们看一下根据您要显示的数据类型分组的不同类型的可视化,每种可视化都有特殊的含义。

A Tour Through the Visualization Zoo groups them in five sets:

可视化动物园之旅将它们分为五组:

Time series (temporality): index chart, stacked graph, small multiple, horizon graph

时间序列(时间):索引图,堆叠图,小倍数,地平线图 Statistical distributions: stem-and-leaf plot, q-q plot, Scatter Plot Matrix (SPLOM), parallel coordinates

统计分布:茎叶图,qq图,散点图矩阵(SPLOM),平行坐标 Maps (geography, spatial representation): flow map, choropleth map, graduated symbol map, cartogram

地图(地理,空间表示):流图,十字线图,带刻度的符号图,地图 Hierarchies: node-link diagram, dendrogram, adjacency diagram (sunburst, icicle), enclosure diagram (squarified treemap, circle-packing)

层次结构:节点链接图,树状图,邻接图(森伯斯特,冰柱),外壳图(方格树图,圆包装) Networks (interconnections): force-directed layout, arc diagram, matrix views

网络(互连):力导向布局,弧形图,矩阵视图

Whereas Christian Behrens groups them in eight sets:

克里斯蒂安·贝伦斯将它们分为八组:

Correlations: bubble chart, scatterplot

相关性:气泡图,散点图 Continuous quantities: sparklines, stacked area chart, multiset line chart, simple line chart

连续数量:迷你图,堆积面积图,多组折线图,简单折线图 Discrete quantities: span chart, isometric bar chart, stacked bar chart, dot matrix, multiset bar chart, simple bar chart

离散数量:跨度图,等距条形图,堆积条形图,点矩阵,多组条形图,简单条形图 Proportions: ring chart, simple pie chart

比例:环形图,简单饼图 Flows: thread arcs, sankey diagram

流程:螺纹弧,sankey图 Hierarchies: tree diagram, tree map

层次结构:树形图,树形图 Networks: diagram map, relation circle, pearl necklet

网络:关系图,关系圈,珍珠项链 Spatial configurations: topographic map, thematic map

空间配置:地形图,专题图

In order to choose the best graphic form, another approach is to think about what you would like to show: comparison (among items or over time), relationship and distribution (depending on the number of variables), or composition (changing over time or static).

为了选择最佳的图形形式,另一种方法是考虑要显示的内容:比较(在项目之间或随时间变化),关系和分布(取决于变量的数量)或组成(随时间变化或静态的)。

The following chart suggestion tool has been designed by Andrew Abela and is available on his Extreme Presentation method site.

以下图表建议工具由安德鲁·阿贝拉(Andrew Abela)设计,可以在其“ 极端演示”方法站点上使用 。

Once you have chosen the best type of visualization to express your message, you still have to implement it. There are many javascript libraries that can be used, see for instance Thomas Greco’s article for a short list of them.

一旦选择了最佳的可视化类型来表达您的信息,您仍然必须实现它。 有许多可使用的javascript库,例如,请参阅Thomas Greco的文章以获取它们的简短列表。

If you prefer to keep your hands away from the code, some services can help you create embed visualizations directly from your data:

如果您希望远离代码,则某些服务可以帮助您直接从数据创建嵌入可视化:

Datawrapper is used on many newspapers’ digital editions. This free open source project is aimed to be a tool for data-driven journalism.

Datawrapper用于许多报纸的数字版本。 这个免费的开源项目旨在成为数据驱动新闻业的工具。

Tableau public is a free tool to create interactive visualizations and publish them on the web.

Tableau public是免费的工具,用于创建交互式可视化并将其发布在Web上。

Infogr.am is a free application to create infographics and charts.

Infogr.am是创建图表和图表的免费应用程序。

Story Maps combines interactive maps and multimedia content. It provides some interesting apps, like the spyglass or the map journal to bring out predefined user experience. This service is based on the ArcGIS mapping platform which has a nonprofit organization program.

故事地图结合了交互式地图和多媒体内容。 它提供了一些有趣的应用程序,例如望远镜或地图日志,可以带来预定义的用户体验。 该服务基于具有非营利组织程序的ArcGIS制图平台。

励志网站 (Inspiring Sites)

If the Snow Fall longform content has been one of the first widespread multimedia story, The New York Times keeps publishing great content in data visualization, explanatory graphics and other visual and interactive features. They compile their most emblematic production in yearly pages (2012, 2013), you should give an eye to for great inspiration.

如果说Snow Fall longform内容是第一个广泛传播的多媒体故事,则《纽约时报》将继续在数据可视化,解释性图形以及其他视觉和交互功能方面发布出色的内容。 他们搜集他们在每年的页面(最具代表性的生产2012 , 2013 ),你应该注意一下了很大的启发。

The Guardian Interactive lists the latest interactive stories published on their site, from multimedia longform articles to interactive infographics and videos.

《卫报互动》列出了他们网站上发布的最新互动故事,从多媒体长篇文章到互动图表和视频。

The Texas Tribune website has a whole section to let their users interact with data. Even though this data is freely available elsewhere as it comes from public data sources, this section of the site accounts for 45% up to 55% of the whole website traffic. It clearly shows that if you provide data targeted to your audience in a meaningful and engaging way, your visitors will come back for it, use it and share it, a traffic that can be easily leveraged through social medias.

德州论坛报网站上有一个完整的版块,可让用户与数据进行交互。 即使这些数据来自公共数据源,也可以在其他地方免费获得,但该站点的这一部分占整个网站流量的45%至55%。 它清楚地表明,如果您以有意义且引人入胜的方式提供针对受众的数据,那么您的访问者将会回来使用它并共享它,这种访问量可以通过社交媒体轻松利用。

La Nación, a newspaper from Argentina, offers its best data visualization articles on their site.

阿根廷的一家报纸LaNación在其网站上提供了最佳的数据可视化文章。

Le Monde, a French newspaper, also presents its visualizations on a dedicated section.

法国报纸《世界报》 ( Le Monde)还在专门的部分展示了其可视化效果。

As you can see, a lot of newspapers try to spice up their digital edition with interactive dataviz. If the most renown are a good source of inspiration with their big data teams, you should make your own selection of less globally renown ones, as a smaller team can come up with an innovative idea that can be inspiring for your own content.

如您所见,许多报纸都试图通过交互式dataviz来为其数字版本增添趣味。 如果最有名望的大数据团队是他们的灵感之源,那么您应该从全球知名度较低的团队中选择自己的名字,因为较小的团队可以提出一个可以激发您自己内容的创新思想。

深入研究主题 (Dive Deeper into the Subject)

This article is mostly focused on finding content ideas from data sources, to bring food to your ideation process. But this is only a very narrow view on datajournalism. If you are interested in the multiple activities involved in data journalism, here are some links to go deeper:

本文主要侧重于从数据源中查找内容的想法,以将食物带入您的构思过程。 但这只是对数据新闻主义的狭narrow看法。 如果您对数据新闻业涉及的多种活动感兴趣,可以通过以下链接进行更深入的研究:

Data driven journalism is an initiative of the European Journalism Centre. On top of being a hub for news and resources, it provides a MOOC course entitled Doing Journalism with Data: First Steps, Skills and Tools. The course is now closed and available in archive mode only until the end of the year, but it really worths the time you can dedicate to it.

数据驱动新闻业是欧洲新闻中心的一项举措。 它不仅是新闻和资源的中心,还提供了一个MOOC课程,名为“用数据进行新闻业:第一步,技能和工具” 。 该课程现已关闭,并且只能在年底之前以存档模式使用,但确实值得您花时间投入。

The Data Journalism Handbook is a free open source reference book from both the European Journalism Centre and the Open Knowledge Foundation. It covers data journalism in newsrooms, case studies, and how to get, understand and deliver data to your audience.

数据新闻学手册是欧洲新闻学中心和开放知识基金会的免费开源参考书。 它涵盖了新闻编辑室中的数据新闻报道,案例研究,以及如何获取,理解并向您的受众传递数据。

School of Data, backed by the Open Knowledge Foundation, provides a lot of tools and articles to deal with data extraction, cleaning, analysis, presentation and sharing.

在开放知识基金会的支持下,数据学院提供了许多工具和文章来处理数据提取,清理,分析,表示和共享。

I hope you have now a whole idea of what datajournalism is, and especially how it can help you come up with content ideas. Using different data sources, correlating unrelated data to freshen up your point of view, getting exposed to visualization techniques, and most of all getting inspired by what others are doing: these are the lessons you should take away from this article.

我希望您现在对数据新闻学有了一个整体的了解,尤其是它如何帮助您提出内容创意。 使用不同的数据源,关联不相关的数据以刷新您的观点,了解可视化技术,并且大多数情况下会受到其他人的启发:这些是您应该从本文中学到的课程。

Cet article est aussi disponible en français

法语国家英语词汇表

翻译自: https://www.sitepoint.com/generate-content-ideas-data-journalism/

建立新闻内容的数据库

相关资源:Excel数据处理与分析实战技巧精粹 附书源码
最新回复(0)