少女祈祷中...

友情提示:由于HuggingFace社区触犯了天朝的某些法律,有关HuggingFace系列的内容中,提到“冲浪板”就指科学上网,需要借助国外旅游工具。

管道工具

管道工具是指已经训练成熟的模型,即使照搬过来不做任何处理也能发挥很好的效果。不过要注意,第一次使用管道工具需要加载,后续使用时,即便本地已经保存,依然需要冲浪板。

使用管道工具

文本分类

1
2
3
4
5
6
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier('I hate you')[0]
print(result)
result = classifier('I love you')[0]
print(result)
1
2
{'label': 'NEGATIVE', 'score': 0.9991129040718079}
{'label': 'POSITIVE', 'score': 0.9998656511306763}

可以看到,使用管道工具的代码非常简洁,把任务类型输入pipeline()函数中,返回值即为能执行具体预测任务的classifier对象,如果向具体的句子输入该对象,则会返回具体的预测结果。本例代码中预测了I hate you和I love you 两句话的情感分类。从运行结果可以看到,I hate you和I love you 两句话的情感分类结果分别为NEGATIVE和POSITIVE,并且分数都高于0.99,可见模型对预测结果的信心很强。

阅读理解

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
question_answerer = pipeline('question-answering')
context=r"""
Extractive Question Answering is the task of extracting an
answer from a text
given a question. An example of a
question answering dataset is the SQuAD dataset, which is
entirely based on
that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the
examples/PyTorch/questionanswering/run_squad.py script.
"""
result=question_answerer(
question="What is extractive question answering?",
context=context,
)
print(result)
result=question_answerer(
question="What is a good example of a question answering dataset?",
context=context,
)
print(result)
1
2
{'score': 0.5949987173080444, 'start': 34, 'end': 95, 'answer': 'the task of extracting an answer from a text given a question'}  
{'score': 0.5136864185333252, 'start': 147, 'end': 160, 'answer': 'SQuAD dataset'}

在本例中,首先以’question-answering’为参数调用了pipeline()函数,得到到了question_answerer对象。context是一段文本,也是模型需要阅读理解的目标,把context和关于context的一个问题同时输入question_answerer对象中,即可得到相应的答案。
注意:问题的答案必须在context中出现过,因为模型的计算过程是从context中找出问题的答案,所以如果问题的答案不在context中,则模型不可能找到答案。
在示例代码中问了关于context的两个问题,所以此处得到了两个答案。
第1个问题翻译成中文是“什么是抽取式问答?”,模型给出的答案翻译成中文是“从给定文本中提取答案的任务”。
第2个问题翻译成中文是“问答数据集的一个好例子是什么?”,模型给出的答案翻译成中文是“SQuAD数据集”。

完形填空

1
2
3
4
unmasker=pipeline("fill-mask")

sentence='HuggingFace is creating a <mask> that the community uses to solve NLP tasks.'
unmasker(sentence)
1
[{'score': 0.1792752891778946, 'token': 3944, 'token_str': ' tool', 'sequence': 'HuggingFace is creating a tool that the community uses to solve NLP tasks.'}, {'score': 0.113493911921978, 'token': 7208, 'token_str': ' framework', 'sequence': 'HuggingFace is creating a framework that the community uses to solve NLP tasks.'}, {'score': 0.05243545398116112, 'token': 5560, 'token_str': ' library', 'sequence': 'HuggingFace is creating a library that the community uses to solve NLP tasks.'}, {'score': 0.034935273230075836, 'token': 8503, 'token_str': ' database', 'sequence': 'HuggingFace is creating a database that the community uses to solve NLP tasks.'}, {'score': 0.02860250696539879, 'token': 17715, 'token_str': ' prototype', 'sequence': 'HuggingFace is creating a prototype that the community uses to solve NLP tasks.'}]

原问题翻译成中文是“HuggingFace正在创建一个社区用户,用于解决NLP任务的____。”,模型按照信心从高到低给出了5个答案,翻译成中文分别是“工具”“框架”“资料库”“数据库”“原型”。

文本生成

1
2
3
text_generator=pipeline("text-generation")

text_generator("As far as I am concerned, I will",max_length=50,do_sample=False)
1
[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]  

在这段代码中,得到了text_generator对象后,直接调用text_generator对象,入参为一个句子的开头,让text_generator接着往下续写,参数max_length=50表明要续写的长度。
这段文本翻译成中文后为就我而言,我将是第1个承认我不支持“自由市场”理念的人,我认为自由市场的想法有点牵强。我认为这个想法……

命名实体识别

1
2
3
4
5
6
7
8
9
10
11
ner_pipe=pipeline("ner")

sequence = """Hugging Face Inc. is a company based in New York
City. Its
headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible
from the
window."""

for entity in ner_pipe(sequence):
print(entity)
1
2
3
4
5
6
7
8
9
10
11
12
{'entity': 'I-ORG', 'score': 0.99957865, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}
{'entity': 'I-ORG', 'score': 0.9909764, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}
{'entity': 'I-ORG', 'score': 0.9982224, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}
{'entity': 'I-ORG', 'score': 0.9994879, 'index': 4, 'word': 'Inc', 'start': 13, 'end': 16}
{'entity': 'I-LOC', 'score': 0.9994344, 'index': 11, 'word': 'New', 'start': 40, 'end': 43}
{'entity': 'I-LOC', 'score': 0.9993197, 'index': 12, 'word': 'York', 'start': 44, 'end': 48}
{'entity': 'I-LOC', 'score': 0.9993794, 'index': 13, 'word': 'City', 'start': 49, 'end': 53}
{'entity': 'I-LOC', 'score': 0.98625815, 'index': 19, 'word': 'D', 'start': 79, 'end': 80}
{'entity': 'I-LOC', 'score': 0.951427, 'index': 20, 'word': '##UM', 'start': 80, 'end': 82}
{'entity': 'I-LOC', 'score': 0.9336589, 'index': 21, 'word': '##BO', 'start': 82, 'end': 84}
{'entity': 'I-LOC', 'score': 0.9761654, 'index': 28, 'word': 'Manhattan', 'start': 114, 'end': 123}
{'entity': 'I-LOC', 'score': 0.9914629, 'index': 29, 'word': 'Bridge', 'start': 124, 'end': 130}

可以看到,模型识别中的原文中的组织机构名为Hugging Face Inc,地名为New York City、DUMBO、Manhattan Bridge。

文本摘要

使用管道工具处理文本摘要任务。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from transformers import pipeline
summarizer = pipeline("summarization", model="Falconsai/text_summarization")

article = """
Hugging Face: Revolutionizing Natural Language Processing
Introduction
In the rapidly evolving field of Natural Language Processing (NLP), Hugging Face has emerged as a prominent and innovative force. This article will explore the story and significance of Hugging Face, a company that has made remarkable contributions to NLP and AI as a whole. From its inception to its role in democratizing AI, Hugging Face has left an indelible mark on the industry.
The Birth of Hugging Face
Hugging Face was founded in 2016 by Clément Delangue, Julien Chaumond, and Thomas Wolf. The name "Hugging Face" was chosen to reflect the company's mission of making AI models more accessible and friendly to humans, much like a comforting hug. Initially, they began as a chatbot company but later shifted their focus to NLP, driven by their belief in the transformative potential of this technology.
Transformative Innovations
Hugging Face is best known for its open-source contributions, particularly the "Transformers" library. This library has become the de facto standard for NLP and enables researchers, developers, and organizations to easily access and utilize state-of-the-art pre-trained language models, such as BERT, GPT-3, and more. These models have countless applications, from chatbots and virtual assistants to language translation and sentiment analysis.
Key Contributions:
1. **Transformers Library:** The Transformers library provides a unified interface for more than 50 pre-trained models, simplifying the development of NLP applications. It allows users to fine-tune these models for specific tasks, making it accessible to a wider audience.
2. **Model Hub:** Hugging Face's Model Hub is a treasure trove of pre-trained models, making it simple for anyone to access, experiment with, and fine-tune models. Researchers and developers around the world can collaborate and share their models through this platform.
3. **Hugging Face Transformers Community:** Hugging Face has fostered a vibrant online community where developers, researchers, and AI enthusiasts can share their knowledge, code, and insights. This collaborative spirit has accelerated the growth of NLP.
Democratizing AI
Hugging Face's most significant impact has been the democratization of AI and NLP. Their commitment to open-source development has made powerful AI models accessible to individuals, startups, and established organizations. This approach contrasts with the traditional proprietary AI model market, which often limits access to those with substantial resources.
By providing open-source models and tools, Hugging Face has empowered a diverse array of users to innovate and create their own NLP applications. This shift has fostered inclusivity, allowing a broader range of voices to contribute to AI research and development.
Industry Adoption
The success and impact of Hugging Face are evident in its widespread adoption. Numerous companies and institutions, from startups to tech giants, leverage Hugging Face's technology for their AI applications. This includes industries as varied as healthcare, finance, and entertainment, showcasing the versatility of NLP and Hugging Face's contributions.
Future Directions
Hugging Face's journey is far from over. As of my last knowledge update in September 2021, the company was actively pursuing research into ethical AI, bias reduction in models, and more. Given their track record of innovation and commitment to the AI community, it is likely that they will continue to lead in ethical AI development and promote responsible use of NLP technologies.
Conclusion
Hugging Face's story is one of transformation, collaboration, and empowerment. Their open-source contributions have reshaped the NLP landscape and democratized access to AI. As they continue to push the boundaries of AI research, we can expect Hugging Face to remain at the forefront of innovation, contributing to a more inclusive and ethical AI future. Their journey reminds us that the power of open-source collaboration can lead to groundbreaking advancements in technology and bring AI within the reach of many.
"""

summarizer(article,max_length=100,min_length=50,do_sample=False)
1
[{'summary_text': 'Hugging Face has emerged as a prominent and innovative force in NLP . From its inception to its role in democratizing AI, the company has left an indelible mark on the industry . The name "Hugging Face" was chosen to reflect the company\'s mission of making AI models more accessible and friendly to humans .'}]

至于原文及摘要各是什么意思可以自行找机器翻译一下。

文本翻译

1
2
3
4
translator=pipeline("translation_en_to_de")

sentence="Hugging Face is a technology company based in New York and Paris"
translator(sentence, max_length=40)
1
[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]

在这段代码中,首先以参数translation_en_to_de调用了pipeline()函数,得到了translator。从该参数可以看出,这是一个从英文翻译到德文的管道工具。
模型给出的德文翻译成中文是“Hugging Face是一家总部位于纽约和巴黎的科技公司。”这和英文原文的意思基本一致。

替换模型执行翻译任务

1
2
3
4
5
6
7
8
9
10
11
from transformers import AutoTokenizer,AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained('Helsinki-NLP/opus-mt-zh-en')
model = AutoModelForSeq2SeqLM.from_pretrained('Helsinki-NLP/opus-mt-zh-en')
translator = pipeline(
task='translation_zh_to_en',
model=model,
tokenizer=tokenizer,
)
sentence='我叫萨拉,住在伦敦。'
translator(sentence,max_length=20)
1
[{'translation_text': 'My name is Sarah, and I live in London.'}]

在这段代码中,同样执行翻译任务,不过执行了默认的翻译任务工具不支持的中译英任务,为了支持中译英这个任务,需要替换默认的模型,代码中加载了一个模型和其对应的编码工具,再把模型和编码工具作为参数输入pipeline()函数中,得到替换了模型的翻译管道工具。