如何在R中导入数据和导出结果

tech2022-09-18  163

With the craze for “big” data, analytics tools have gained popularity. One of these tools is the programming language R. In this post, I’ll show how to extract data from text files, CSV files, and databases. Then I’ll show how to send that data to a web server.

随着对“大”数据的狂热,分析工具变得越来越流行。 这些工具之一是编程语言R。 在本文中,我将展示如何从文本文件,CSV文件和数据库中提取数据。 然后,我将展示如何将数据发送到Web服务器。

You may be wondering, Do I need to learn a new language all over again? The answer is no! All you need to know is a few commands.

您可能想知道, 我是否需要重新学习一门新语言? 答案是不! 您只需要知道一些命令即可。

Programmers from diverse backgrounds who work on web applications in a variety of programming languages can import the data into R and, after processing, export it in the format they require.

来自不同背景的程序员可以使用各种编程语言来处理Web应用程序,这些数据可以导入R中,并在处理后以所需的格式导出。

Note: If you’re not familiar with R, I recommend SitePoint’s article on how to install R and RStudio. It provides basic commands in R and a general introduction to the language. This post covers commands that can be run on the R terminal without the use of the RStudio IDE. However, handling large datasets on a terminal could turn out to be difficult for beginners, so I’d suggest using RStudio for an enriched experience. In RStudio, you can run the same commands in the Console box.

注意:如果您不熟悉R,建议您参阅SitePoint有关如何安装R和RStudio的文章。 它提供了R中的基本命令以及该语言的一般介绍。 这篇文章介绍了无需使用RStudio IDE即可在R终端上运行的命令。 但是,对于初学者来说,在终端上处理大型数据集可能会很困难,因此我建议使用RStudio以获得丰富的体验。 在RStudio中,您可以在“控制台”框中运行相同的命令。

处理文本文件 (Handling Text Files)

A text file present on your local machine can be read using a slightly modified read.table command. Because it’s designed for reading tables, you can set the separator to an empty string (“”) to read a text file line by line:

可以使用稍微修改的read.table命令读取本地计算机上存在的文本文件。 因为它是为读取表而设计的,所以可以将分隔符设置为空字符串(“”)以逐行读取文本文件:

file_contents = read.table("<path_to_file>", sep = "")

Note: where you see angled brackets such as in <path_to_file>, insert the necessary number, identifier, etc. without the brackets.

注意:如果您在<path_to_file>看到尖括号,请插入必要的数字,标识符等,不要带括号。

The path to the file may also be the relative path to the file. If your rows have unequal length, you have to set fill = TRUE as well. The output of this command is a data frame in R.

文件的路径也可以是文件的相对路径。 如果行的长度不相等,则还必须设置fill = TRUE 。 该命令的输出是R中的数据帧 。

If your file is too large to be read in one go, you can try reading it in steps using the skip and nrow options. For instance, to read the lines 6–10 in your file, run the following commands:

如果文件太大而无法一次读取,则可以尝试使用“ skip和“ nrow选项逐步读取nrow 。 例如,要读取文件中的第6-10行,请运行以下命令:

connection <- file("<path_to_file>") lines6_10 = read.table(connection, skip=5, nrow=5) # 6-10 lines

处理CSV文件 (Handling CSV Files)

A CSV (comma-separated values) file is a file that, quite literally, contains values separated by commas. You can read a CSV file using the read.csv command:

CSV(逗号分隔值)文件是一个文件,实际上,它包含用逗号分隔的值。 您可以使用read.csv命令读取CSV文件:

file_contents = read.csv("<path_to_file>")

A header option states whether the CSV file contains column headers. It is set to TRUE by default. (This can also be specified when reading text files.) In case you have unequal columns in different rows, you need to set fill to TRUE as well.

header选项说明CSV文件是否包含列标头。 默认情况下将其设置为TRUE 。 (也可以在读取文本文件时指定。)如果不同行中的列不相等,则还需要将fill设置为TRUE 。

For large files, you can skip rows in a similar manner:

对于大文件,您可以通过类似的方式跳过行:

connection <- file("<path_to_file>") lines6_10 = read.csv(connection, skip=5, nrow=5) # 6-10 lines

使用MySQL数据库 (Using MySQL Databases)

To make database connections, you need the separate RMySQL library. It can be installed using the following command:

要建立数据库连接,您需要单独的RMySQL库。 可以使用以下命令进行安装:

install.packages('RMySQL')

Once installed, you need to activate it by running the following:

安装后,您需要通过运行以下命令来激活它:

library('RMySQL')

Assuming that your database is running, you can now make MySQL queries after establishing a connection:

假设您的数据库正在运行,现在您可以在建立连接后进行MySQL查询:

con <- dbConnect(MySQL(), user="root", password="root", dbname="nsso", host="localhost", port=8889)

If you’re running MySQL through MAMP on a Mac, you need to specify a unix.socket:

如果在Mac上通过MAMP运行MySQL,则需要指定unix.socket :

con <- dbConnect(..., unix.socket = "/Applications/MAMP/tmp/mysql/mysql.sock")

To make a MySQL query, you first need to execute the query and then store the data in a data frame:

要进行MySQL查询,您首先需要执行查询,然后将数据存储在数据框中:

rs <- dbSendQuery(con, "SELECT * FROM my_table;") # Make sure you run a LIMIT if your query is too large data <- fetch(rs, n = -1)

Once you’re done with your queries, you can disconnect your connection through the dbDisconnect command:

查询完成后,可以通过dbDisconnect命令断开连接:

dbDisconnect(con)

在网络上读取数据 (Read Data on the Web)

What if your data source is on the web? How do you read online files? In R, it can be done simply by changing the file path that you specify in the read command. You need to use the url command and specify the URL in the read.csv command. For instance:

如果您的数据源在网上怎么办? 您如何阅读在线文件? 在R中,只需更改您在read命令中指定的文件路径即可完成此操作。 您需要使用url命令并在read.csv命令中指定URL。 例如:

file_contents = read.csv(url("<file_URL>"))

For a database, the host may be changed to extract data from a database on a web server.

对于数据库,可以更改host以从Web服务器上的数据库提取数据。

汇出资料 (Export Data)

Just like read.csv and read.table, a data frame can be exported into a text or a CSV file using the write commands:

就像read.csv和read.table ,可以使用write命令将数据框导出为文本或CSV文件:

write.csv(data_frame, file = "data.csv")

To export as a text file using a different delimiter (say, a tab), you can use the write.table command:

要使用其他定界符(例如,选项卡)导出为文本文件,可以使用write.table命令:

write.table(data_frame, file = "data.txt", sep = "\t")

Updating databases is just as easy, and can be done by executing UPDATE and INSERT MySQL commands.

更新数据库同样简单,可以通过执行UPDATE和INSERT MySQL命令来完成。

导出图 (Export Graphs)

Once you’ve processed and plotted your data in R, you can export it too! The png or jpeg command does that for you. Basically, it saves the plot that’s currently active:

在R中处理并绘制数据后,您也可以将其导出! png或jpeg命令可以为您完成此任务。 基本上,它将保存当前处于活动状态的图:

# Initiate Image png(filename="sample.png") # Make a plot plot(c(1,2,3,4,5), c(4,5,6,7,8)) # Save the plot dev.off()

Ideally, you can tweak the second command to save a required plot.

理想情况下,您可以调整第二个命令以保存所需的图。

将数据导出到Web (Export Data to the Web)

Uploading files to the web directly might be a bit tricky, but you can export data to the web using two steps: save a file locally, then upload it to the web. You can upload a file to the web using a POST request through R, which can be emulated using the httr package:

直接将文件上传到Web可能有点棘手,但是您可以通过两个步骤将数据导出到Web:在本地保存文件,然后将其上传到Web。 您可以通过R使用POST请求将文件上传到Web,可以使用httr包进行仿真:

POST("<upload_URL>", body = list(name="<path_to_local_file>", filedata = upload_file(filename, "text/csv")))

For more details, here’s a quickstart guide on the httr package.

有关更多详细信息,这是httr软件包的快速入门指南 。

结论 (Conclusion)

R has gained a lot of popularity in recent years among people working with statistics, and now’s a good time to learn this wonderful language. It’s flexible enough to sync with various types of data sources, and working with R is very easy too, irrespective of your background. Let’s hope this post got you started with R!

近年来,R在从事统计工作的人们中广受欢迎,现在是学习这种出色语言的好时机。 它足够灵活,可以与各种类型的数据源进行同步,并且使用R也非常容易,无论您的背景如何。 希望这篇文章能使您从R开始!

翻译自: https://www.sitepoint.com/how-to-import-data-and-export-results-in-r/

最新回复(0)