目录
说明配置环境此节说明代码
说明
本博客代码来自开源项目:《动手学深度学习》(PyTorch版) 并且在博主学习的理解上对代码进行了大量注释,方便理解各个函数的原理和用途
配置环境
使用环境:python3.8 平台:Windows10 IDE:PyCharm
此节说明
此节对应书本上10.6节 此节功能为:求近义词和类比词 由于此节相对简单,代码注释量较少 次节需要使用GloVe数据集,官网下载较慢,可在此链接下载
代码
from matplotlib
import pyplot
as plt
import torch
import torchtext
.vocab
as vocab
print(vocab
.pretrained_aliases
.keys
())
print([key
for key
in vocab
.pretrained_aliases
.keys
()
if "glove" in key
])
cache_dir
= "D:/Program/Pytorch/Datasets/glove"
glove
= vocab
.GloVe
(name
='6B', dim
=50, cache
=cache_dir
)
print("一共包含%d个词。" % len(glove
.stoi
))
print(glove
.stoi
['beautiful'], glove
.itos
[3366])
def knn(W
, x
, k
):
cos
= torch
.matmul
(W
, x
.view
((-1,))) / (
(torch
.sum(W
* W
, dim
=1) + 1e-9).sqrt
() * torch
.sum(x
* x
).sqrt
())
_
, topk
= torch
.topk
(cos
, k
=k
)
topk
= topk
.cpu
().numpy
()
return topk
, [cos
[i
].item
() for i
in topk
]
def get_similar_tokens(query_token
, k
, embed
):
topk
, cos
= knn
(embed
.vectors
,
embed
.vectors
[embed
.stoi
[query_token
]], k
+1)
for i
, c
in zip(topk
[1:], cos
[1:]):
print('cosine sim=%.3f: %s' % (c
, (embed
.itos
[i
])))
get_similar_tokens
('chip', 3, glove
)
get_similar_tokens
('baby', 3, glove
)
get_similar_tokens
('beautiful', 3, glove
)
def get_analogy(token_a
, token_b
, token_c
, embed
):
vecs
= [embed
.vectors
[embed
.stoi
[t
]]
for t
in [token_a
, token_b
, token_c
]]
x
= vecs
[1] - vecs
[0] + vecs
[2]
topk
, cos
= knn
(embed
.vectors
, x
, 3)
for i
, c
in zip(topk
[:], cos
[:]):
print('origin world = %s cosine sim=%.3f: %s' % (token_c
,c
, (embed
.itos
[i
])))
return embed
.itos
[topk
[0]]
get_analogy
('man', 'woman', 'son', glove
)
get_analogy
('beijing', 'china', 'tokyo', glove
)
get_analogy
('bad', 'worst', 'big', glove
)
get_analogy
('do', 'did', 'go', glove
)
'''
上面的一些例子只是表现比较好的例子,自己尝试几个会发现这个算法的准确性并不高
像'do', 'doing', 'play'联想到的就是play而非playing
'''
print("*" * 50)