目录
Unity Android 之 jsoup 爬虫爬取新闻信息,并封装给 Unity 调用的方法整理
一、简单介绍
二、实现原理
三、注意事项
四、效果预览
五、网页数据分析,以确认 jsoup 解析需要的数据
六、实现步骤
Android 端
Unity 端
七、关键代码
Android 端
Unity 端
Unity Android 开发上会用到的技术简单整理,方便自己日后查看,能帮助到大家就更好了。
本节介绍,Unity 开发中,把从 Android 使用 jsoup 爬虫爬取到的新闻信息(爬取的是凤凰网的新闻:https://news.ifeng.com/)的接口的方法整理封装给Unity调用,方法不唯一,欢迎指正。
其中,纯 Android 的 jsoup 爬取使用,可参见下面的博文:
Android Studio 爬虫 之 简单实现使用 jsoup/okhttp3 爬取购物商品信息的案例demo(附有详细步骤)
1、okhttp3 获得网页的 html 内容
2、jsoup 解析 html 的内容,获取需要的部分信息
3、打包 aar ,给 Unity 进行调用
1、网页的 html 有些格式标签可能会变化,具体根据实际最新的网页 html 为准
2、AndroidManifest.xml 中注意添加 INTERNET 权限
3、okhttp3 依赖 okio (最好下载 okio-1.16.0.jar),使用的时候也要添加这个 jar 包
1、爬取目标网页
(网址为:https://news.ifeng.com/)
2、选中网页右键 检查元素(这里以火狐浏览器为例),查看 html 源码
3、点击选中 鼠标选择器,这样选择 html 就会和 网页 双双对应
4、Elements 的目标 html 为如下
5、title 的目标 html 为如下
6、source 的目标 html 为如下
7、imgurl 的目标 html 为如下
8、commentNum 的目标 html 为如下
9、articleurl 的目标 html 为如下
(获取的方法很多,我根据图片获取的)
1、打开 Android Studio 之后,新建一个工程或者一个模块
2、选择 模块,点击 Next
3、把jsoup、okhttp、okio都引入模块中
4、编写相关代码,NewsStruct 为数据模型类,OkHttpUtils 获取网页 html,GetNewsStructDataUtil 解析 html 整理成列表NewsStruct 模型数据,GetNewsDataJsonStringListener 线程 爬取完新闻数据的毁掉监听接口
5、脚本编辑 OK,Build - Make Module ‘xxxx’,打包 aar
6、打包成功,build 文件夹outputs/aar中就会 有对应的 aar 生成
1、把 Android 打包的 aar 导入 Unity,这里用到 LitJson 也一并导入
2、在场景中布局UI,Grid 获取 Layout 排布获取的新闻数据,并把 NewsDataItem 作为预制体,其中 Grid 的 Layout 如下
3、在 Unity工程中,新建脚本,编辑对应功能,JsoupCrawlerNewsDataWrapper 接口Android aar 获取 新闻数据,GetNewsDataJsonStringListener 获取新闻数据完成的监听接口类,NewsDataStruct 新闻类数据结构,NewsDataItem 新闻预制体类,MonoSingleton 单例类,TestJsoupCrawlerNewsDataWrapper 测试 JsoupCrawlerNewsDataWrapper 的功能
4、把 TestJsoupCrawlerNewsDataWrapper 挂载到场景中,并对应赋值
5、运行场景,没有问题,即可打包到真机上测试
6、真机上的测试效果如上
1、GetNewsStructDataUtil
package com.example.javacrawlerwrapper; import android.os.Handler; import android.os.Message; import android.util.Log; import org.json.JSONArray; import org.json.JSONException; import org.json.JSONObject; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import java.util.ArrayList; public class GetNewsStructDataUtil { private static final String TAG ="Main_GetNewsStructDataUtil" ; final static String urlNews = "https://news.ifeng.com/"; private GetNewsDataJsonStringListener NewsDataJsonStringListener; public void GetNewsJsonString(GetNewsDataJsonStringListener newsDataJsonStringListener){ Log.i(TAG, "GetNewsJsonString: "); NewsDataJsonStringListener = newsDataJsonStringListener; toThreadNews(); } /** * 主綫程監聽爬蟲數據完成 */ private Handler handler = new Handler() { public void handleMessage(Message msg) { Log.i(TAG, "handleMessage: "+ "爬结束"); switch (msg.what) { case 2: Log.i(TAG, "handleMessage: "+ "开始展示数据"); String newsStructs = (String)msg.obj; Log.i(TAG, "handleMessage:articles.size() "+ newsStructs); if(NewsDataJsonStringListener != null){ NewsDataJsonStringListener.GetNewsDataJsonStringOKAction(newsStructs); } break; default: break; } } }; /** * 綫程中爬取新聞數據 */ private void toThreadNews(){ Log.i(TAG, "toThreadNews: "); new Thread(){ public void run(){ String html = OkHttpUtils.OkGetArt(urlNews); String newsStructs = spiderNews(html); //发送信息给handler用于更新UI界面 Message message = handler.obtainMessage(); message.what = 2; message.obj = newsStructs; handler.sendMessage(message); } }.start(); } /** * 抓取新聞 * @param html * @return ArrayList<Article> articles */ private ArrayList<NewsStruct> spiderArticle(String html) { ArrayList<NewsStruct> newsStructs = new ArrayList<>(); //Log.i(TAG, "spiderArticle: html " + html); Document document = Jsoup.parse(html); //Log.i(TAG, "spiderArticle: document " + document.text()); Elements elements = document .select("ul[class=news-stream-basic-news-list]") .select("li[class=news-stream-newsStream-news-item-has-image clearfix news_item]"); // Log.i(TAG, "spiderArticle: elements " + elements.html()); for (Element element : elements) { String title = element .select("h2[class=news-stream-newsStream-mr13 news-stream-newsStream-news-item-title news-stream-newsStream-news-item-title-height]") .text(); String classify = ""; String source = element .select("div[class=clearfix]") .select("span") .text(); String imgUrl = element .select("a[class=news-stream-newsStream-image-link]") .select("img") .attr("src"); String commentNum = element .select("div[class=clearfix]") .select("a[class=news-stream-newsStream-ly ly]") .text(); String articleUrl = element .select("a[class=news-stream-newsStream-image-link]") .attr("href"); NewsStruct newsStruct = new NewsStruct(title, classify, source, imgUrl, commentNum, articleUrl); newsStructs.add(newsStruct); //Log.e("DATA>>",article.toString()); } return newsStructs; } /** * 把爬取的數據轉爲json字符串 * @param html * @return */ private String spiderNews(String html) { JSONArray arr=new JSONArray(); //Log.i(TAG, "spiderArticle: html " + html); Document document = Jsoup.parse(html); //Log.i(TAG, "spiderArticle: document " + document.text()); Elements elements = document .select("ul[class=news-stream-basic-news-list]") .select("li[class=news-stream-newsStream-news-item-has-image clearfix news_item]"); // Log.i(TAG, "spiderArticle: elements " + elements.html()); for (Element element : elements) { JSONObject json=new JSONObject(); String title = element .select("h2[class=news-stream-newsStream-mr13 news-stream-newsStream-news-item-title news-stream-newsStream-news-item-title-height]") .text(); String classify = ""; String source = element .select("div[class=clearfix]") .select("span") .text(); String imgUrl = element .select("a[class=news-stream-newsStream-image-link]") .select("img") .attr("src"); String commentNum = element .select("div[class=clearfix]") .select("a[class=news-stream-newsStream-ly ly]") .text(); String articleUrl = element .select("a[class=news-stream-newsStream-image-link]") .attr("href"); try { json.put("title",title); json.put("classify",classify); json.put("source",source); json.put("imgUrl",imgUrl); json.put("commentNum",commentNum); json.put("articleUrl",articleUrl); } catch (JSONException e) { e.printStackTrace(); } arr.put(json); //Log.e("DATA>>",article.toString()); } return arr.toString(); } }
2、NewsStruct
package com.example.javacrawlerwrapper; public class NewsStruct { private String title; private String classify; private String source; private String imgUrl; private String commentNum; private String articleUrl; public NewsStruct(String title, String classify, String source, String imgUrl, String commentNum, String articleUrl) { this.title = title; this.classify = classify; this.source = source; this.imgUrl = imgUrl; this.commentNum = commentNum; this.articleUrl = articleUrl; } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public String getClassify() { return classify; } public void setClassify(String classify) { this.classify = classify; } public String getSource() { return source; } public void setSource(String source) { this.source = source; } public String getImgUrl() { return imgUrl; } public void setImgUrl(String imgUrl) { this.imgUrl = imgUrl; } public String getCommentNum() { return commentNum; } public void setCommentNum(String commentNum) { this.commentNum = commentNum; } public String getArticleUrl() { return articleUrl; } public void setArticleUrl(String articleUrl) { this.articleUrl = articleUrl; } @Override public String toString() { return "NewsStruct{" + "title='" + title + '\'' + ", classify='" + classify + '\'' + ", source='" + source + '\'' + ", imgUrl='" + imgUrl + '\'' + ", commentNum='" + commentNum + '\'' + ", articleUrl='" + articleUrl + '\'' + '}'; } }
3、OkHttpUtils
package com.example.javacrawlerwrapper; import java.io.IOException; import okhttp3.OkHttpClient; import okhttp3.Request; import okhttp3.Response; public class OkHttpUtils { final static String TAG = "OkHttpUtils"; public static String OkGetArt(String url) { String html = null; OkHttpClient client = new OkHttpClient(); Request request = new Request.Builder() .url(url) .build(); try { Response response = client.newCall(request).execute(); //return html = response.body().string(); } catch (IOException e) { e.printStackTrace(); } //Log.i(TAG, "OkGetArt: html "+html); return html; } }
4、GetNewsDataJsonStringListener
package com.example.javacrawlerwrapper; public interface GetNewsDataJsonStringListener { void GetNewsDataJsonStringOKAction(String strNewsJson); }
5、AndroidManifest.xml
<manifest xmlns:android="http://schemas.android.com/apk/res/android" package="com.example.javacrawlerwrapper" > <uses-permission android:name="android.permission.INTERNET"/> </manifest>
1、JsoupCrawlerNewsDataWrapper
using System; using System.Collections; using System.Collections.Generic; using UnityEngine; namespace AndroidWrapper { public class JsoupCrawlerNewsDataWrapper : MonoSingleton<JsoupCrawlerNewsDataWrapper> { /// <summary> /// 获取新闻数据 /// </summary> /// <param name="NewsDataOkAction"></param> public void GetNewsJsonString(Action<string> NewsDataOkAction) { #if UNITY_EDITOR #else MAndroidJavaObject.Call("GetNewsJsonString", new GetNewsDataJsonStringListener(NewsDataOkAction)); #endif } /// <summary> /// 获取新闻数据 /// </summary> /// <param name="NewsDataOkAction"></param> public void GetNewsData(Action<List<NewsDataStruct>> NewsDataOkAction) { #if UNITY_EDITOR string tmpString = @"[{'title':'“卫生巾贫困”之外,落后地区女童生理知识更“贫困”','classify':'','source':'','imgUrl':'//d.ifengimg.com/w144_h80_q90/x0.ifengimg.com/ucms/2020_36/C57C6A42BAA0F755F4C8762F3295834DCD65CAC8_w532_h299.jpg','commentNum':'0','articleUrl':'//news.ifeng.com/c/7zSquTL2haH'},{'title':'“卫生巾贫困”之外,落后地区女童生理知识更“贫困”','classify':'','source':'','imgUrl':'//d.ifengimg.com/w144_h80_q90/x0.ifengimg.com/ucms/2020_36/C57C6A42BAA0F755F4C8762F3295834DCD65CAC8_w532_h299.jpg','commentNum':'0','articleUrl':'//news.ifeng.com/c/7zSquTL2haH'}]"; List<NewsDataStruct> _listNewsData = LitJson.JsonMapper.ToObject<List<NewsDataStruct>>(tmpString); NewsDataOkAction(_listNewsData); #else MAndroidJavaObject.Call("GetNewsJsonString", new GetNewsDataJsonStringListener((strNewsJson)=> { List<NewsDataStruct> _listNewsData = LitJson.JsonMapper.ToObject<List<NewsDataStruct>>(strNewsJson); NewsDataOkAction(_listNewsData); })); #endif } /// <summary> /// 获取新闻数据 /// 1、可能为空 /// 2、建议先使用 GetNewsData ,会调数据OK后,调用这个接口 /// </summary> /// <returns></returns> public List<NewsDataStruct> GetNewsDataList() { return _ListNewsDataStructs; } #region 私有变量 // 新闻数据 private List<NewsDataStruct> _ListNewsDataStructs = new List<NewsDataStruct>(); AndroidJavaObject _mAndroidJavaObject; public AndroidJavaObject MAndroidJavaObject { get { if (_mAndroidJavaObject == null) { _mAndroidJavaObject = new AndroidJavaObject("com.example.javacrawlerwrapper.GetNewsStructDataUtil"); } return _mAndroidJavaObject; } } #endregion } }
2、GetNewsDataJsonStringListener
using System; using System.Collections; using System.Collections.Generic; using UnityEngine; namespace AndroidWrapper { /// <summary> /// 对应 Android 的新闻数据获取 OK 的接口 /// </summary> public class GetNewsDataJsonStringListener : AndroidJavaProxy { // 定义给Unity设置新闻数据获取成功的接口 Action<string> _mNewsDataJsonStringOKAction; public GetNewsDataJsonStringListener(Action<string> NewsDataJsonStringOKAction) : base("com.example.javacrawlerwrapper.GetNewsDataJsonStringListener") { _mNewsDataJsonStringOKAction = NewsDataJsonStringOKAction; } /// <summary> /// 线程获取新闻数据OK的监听 /// </summary> /// <param name="strNewsJson">获取到的新闻Json字符串</param> public void GetNewsDataJsonStringOKAction(string strNewsJson) { if (_mNewsDataJsonStringOKAction != null) { _mNewsDataJsonStringOKAction(strNewsJson); } } } }
3、NewsDataStruct
using System.Collections; using System.Collections.Generic; using UnityEngine; namespace AndroidWrapper { /// <summary> /// 新聞數據結構體 /// </summary> [System.Serializable] public class NewsDataStruct { public string title; public string classify; public string source; public string imgUrl; public string commentNum; public string articleUrl; /// <summary> /// 无参构造函数 /// </summary> public NewsDataStruct() { } /// <summary> /// 有参构造函数 /// </summary> /// <param name="title"></param> /// <param name="classify"></param> /// <param name="source"></param> /// <param name="imgUrl"></param> /// <param name="commentNum"></param> /// <param name="articleUrl"></param> public NewsDataStruct(string title, string classify, string source, string imgUrl, string commentNum, string articleUrl) { this.title = title; this.classify = classify; this.source = source; this.imgUrl = imgUrl; this.commentNum = commentNum; this.articleUrl = articleUrl; } /// <summary> /// ToString 函数 /// </summary> /// <returns></returns> public override string ToString() { return "NewsStruct{" + "title='" + title + '\'' + ", classify='" + classify + '\'' + ", source='" + source + '\'' + ", imgUrl='" + imgUrl + '\'' + ", commentNum='" + commentNum + '\'' + ", articleUrl='" + articleUrl + '\'' + '}'; } } }
4、NewsDataItem
using System; using System.Collections; using System.Collections.Generic; using UnityEngine; using UnityEngine.Networking; using UnityEngine.UI; namespace AndroidWrapper { public class NewsDataItem : MonoBehaviour { public RawImage Img_RawImage; public Text Ttile_Text; public Text Url_Text; private string image_url; private string title; private string news_url; public string Image_url { get => image_url; set { image_url = value; GetTexture(image_url, SetTexttureToRawImage); } } public string Title { get => title; set { title = value; Ttile_Text.text = title; } } public string News_url { get => news_url; set { news_url = value; Url_Text.text = news_url; } } // Start is called before the first frame update void Start() { //GetTexture("//d.ifengimg.com/w144_h80_q90/x0.ifengimg.com/ucms/2020_36/B1A9FEC8D6D64CA138466667796F7D982242F2B7_w605_h340.jpg", SetTexttureToRawImage); } /// <summary> /// 请求图片 /// </summary> /// <param name="url">图片地址,like 'http://www.my-server.com/image.png '</param> /// <param name="action">请求发起后处理回调结果的委托,处理请求结果的图片</param> /// <returns></returns> public void GetTexture(string url, Action<Texture2D> actionResult) { StartCoroutine(_GetTexture(url, actionResult)); } /// <summary> /// 设置图片事件 /// </summary> /// <param name="texture"></param> void SetTexttureToRawImage(Texture texture) { if (Img_RawImage != null) { Img_RawImage.texture = texture; } } /// <summary> /// 请求图片 /// </summary> /// <param name="url">图片地址,like 'http://www.my-server.com/image.png '</param> /// <param name="action">请求发起后处理回调结果的委托,处理请求结果的图片</param> /// <returns></returns> IEnumerator _GetTexture(string url, Action<Texture2D> actionResult) { Debug.Log(GetType() + "/_GetTexture()/ "); UnityWebRequest uwr = new UnityWebRequest(url); DownloadHandlerTexture downloadTexture = new DownloadHandlerTexture(true); uwr.certificateHandler = new MyCertHandler(); uwr.downloadHandler = downloadTexture; yield return uwr.SendWebRequest(); Texture2D t = null; if (!(uwr.isNetworkError || uwr.isHttpError)) { Debug.Log(GetType() + "/_GetTexture()/ !(uwr.isNetworkError || uwr.isHttpError)"); t = downloadTexture.texture; } else { Debug.Log("下载失败,请检查网络,或者下载地址是否正确 "); } if (actionResult != null) { Debug.Log(GetType() + "/_GetTexture()/ actionResult != null"); actionResult(t); } } /// <summary> /// 构建的内部类 /// CertHandler 解决 要认证的问题: /// Curl error 35: Handshake did not perform verification. UnityTls error code: 7 /// </summary> public class MyCertHandler : CertificateHandler { protected override bool ValidateCertificate(byte[] certificateData) { return true; } } } }
5、MonoSingleton
using UnityEngine; public abstract class MonoSingleton<T> : MonoBehaviour where T : MonoBehaviour { private static T instance = null; private static readonly object locker = new object(); private static bool bAppQuitting; public static T Instance { get { if (bAppQuitting) { instance = null; return instance; } lock (locker) { if (instance == null) { instance = FindObjectOfType<T>(); if (FindObjectsOfType<T>().Length > 1) { Debug.LogError("不应该存在多个单例!"); return instance; } if (instance == null) { var singleton = new GameObject(); instance = singleton.AddComponent<T>(); singleton.name = "(singleton)" + typeof(T); singleton.hideFlags = HideFlags.None; DontDestroyOnLoad(singleton); } else DontDestroyOnLoad(instance.gameObject); } instance.hideFlags = HideFlags.None; return instance; } } } protected virtual void Awake() { bAppQuitting = false; } protected virtual void OnDestroy() { bAppQuitting = true; } }
6、TestJsoupCrawlerNewsDataWrapper
using System.Collections; using System.Collections.Generic; using UnityEngine; using AndroidWrapper; public class TestJsoupCrawlerNewsDataWrapper : MonoBehaviour { public Transform parent; public GameObject NewsDataItem_Prefab; // Start is called before the first frame update void Start() { JsoupCrawlerNewsDataWrapper.Instance.GetNewsJsonString((newsJsonStr)=> { Debug.Log(GetType()+ "/Start()/ newsJsonStr : " + newsJsonStr); }); JsoupCrawlerNewsDataWrapper.Instance.GetNewsData((newsListStruct) => { foreach (NewsDataStruct item in newsListStruct) { Debug.Log(GetType() + "/Start()/ newsListStruct : " + item.ToString()); GameObject tmp = Instantiate(NewsDataItem_Prefab,parent,false); tmp.transform.localScale = Vector3.one; tmp.GetComponent<NewsDataItem>().Image_url = "https:"+item.imgUrl; tmp.GetComponent<NewsDataItem>().Title = item.title; tmp.GetComponent<NewsDataItem>().News_url = "https:"+item.articleUrl; } }); } // Update is called once per frame void Update() { } }
