Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

PyText v0.3.2

Compare
Choose a tag to compare
@mwu1993 mwu1993 released this 27 Apr 08:24

New features

  • Add Roberta model into BertPairwiseModel (#1336)
  • Support read file from http URL (#1317)
  • add a new PyText get_num_examples_from_batch function in model (#1319)
  • Add support for length label smoothing (#1308)
  • Add new metrics type for Masked Seq2Seq Joint Model (#1304)
  • Add mask generator and strategy (#1302)
  • Add separate logging for label loss and length loss (#1294)
  • Add tensorizer support for masking of target tokens (#1297)
  • Add length prediction and basic masked generator (#1290)
  • Add self attention option to conv_encoder and conv_decoder (#1291)
  • Entity Saliency modeling on PyText: EntitySalienceMetricReporter/EntitySalienceTask
  • In-batch negative training for BertPairwiseModel
  • Support embedding from decoder (#1284)
  • Add dense features to Roberta
  • Add projection layer to HuggingFace encoder (#1273)
  • add PyText Embedding TorchScript Wrapper
  • Add option to pad missing label in LabelListTensorizer (#1269)
  • Integrate PET and Introduce ElasticTrainer (#1266)
  • support PoolingType in DocNN. (#1259)
  • Added WordSeqEmbedding (#1255)
  • Open source Assistant NLU seq2seq model (#1236)
  • Support multi label classification
  • BART in decoupled model

Bug fixes

  • Fix Incorrect State Dict Assumption (#1326)
  • Bug fix for "RoBERTaTensorizer object has no attribute is_input" (#1334)
  • Cast model output to cpu (#1329)
  • Fix OSS predict-py API (#1320)
  • Fix "calling median on empty tensor" issue in MR (#1322)
  • Add ScriptDoNothingTokenizer so that torchscriptification of SPM does not fail (#1316)
  • Fix creating generator everytime (#1301)
  • fix dense feature for fp16
  • Avoid edge cases with quantization by setting a known seed (#1295)
  • Make torchscript predictions even on empty text / token inputs
  • fix dense feature TorchScript typing (#1281)
  • avoid zero division error in metrics reporter (#1271)
  • Fix contiguous issue in bilstm export (#1270)
  • fix debug file generation for multilabel classification (#1247)
  • Fix fp16 optimizer attribute name

Other

  • Simplify contextual embedding dimension computation in PyText (#1331)
  • New Debug File for masked seq2seq
  • Move MockConfigLoader to OSS (#1324)
  • Pass in optimizer config instead of create_optimizer to trainer
  • Remove unnecessary torch.no_grad() block (#1323)
  • Fix Memory Issues in Metric Reporter for Classification Tasks over large Label Spaces
  • Add contextual embedding support to OS seq2seq model (#1299)
  • recover xlm_r tutorial notebook (#1305)
  • Enable controlling bias in MLP decoder
  • Migrate serving tutorial to TorchScript (#1310)
  • delete caffe2 export (#1307)
  • add whitelist for ONNX export
  • Use dynamic quantization api for BeamSearch (#1303)
  • Remove requirement that eos/bos be supplied for sequence export. (#1300)
  • Multicolumn support
  • Multicolumn support in torchscriptify
  • Add caching support to RawExample and batch predict API (#1298)
  • Add save-pytext-snapshot command to PyText cmdline (#1285)
  • Update with Whatsapp calling data + support dictionary features (#1293)
  • add arrange_caffe2_model_inputs in BaseModel (#1292)
  • Replace unit-tests on LMModel and FLLanguageModelingTask by LiteLMModel and FLLiteLMTask (#1296)
  • changes to make mbart work (#1911)
  • handle encoder and decoder embedding
  • Add tutorial for semantic parsing. (#1288)
  • Add new fb beam search with fused operator (#1287)
  • Move generator builder to constructor so that it can easily overridden. (#1286)
  • Torchscriptify ELTensorizer (#1282)
  • Torchscript export for Seq2Seq model (#1265)
  • Change Seq2Seq model from_config() to a more general api (#1280)
  • add max_seq_len to DocNN TorchScript model (#1279)
  • support XLM-R model Embedding in TorchScript (#1278)
  • Generic PyText Checkpoint Manager Interface (#1267)
  • Fix backward compatibility issue of pad_missing in LabelListTensorizer (#1277)
  • Update mean reduction in NLLLoss (#1272)
  • migrate pages.integrity.scam.docnn_models.xxx (#1275)
  • Unify model input for ByteTokensDocumentModel (#1274)
  • Torchscriptify TokenTensorizer
  • Allow dictionaries to overwrite entries with #fairseq:overwrite comment (#1073)
  • Make WordSeqEmbedding ONNX compatible
  • If the snapshot path provided is not valid, throw error (#1268)
  • support vocab filter by min count
  • Unify input for TorchScript Tensorizers and Models (#1256)
  • Torchscriptify XLM-R
  • Add class logging to task (#1264)
  • Add usage logging to exporter (#1262)
  • Add usage logging across models (#1263)
  • Usage logging on data classes (#1261)
  • GPT2 BPE add lower casing support (#1260)
  • FAISS Embedding Search Space [3/5]
  • Return len of tokens of each sequence in SeqTokenTensorizer (#1254)
  • Vocab Limited Pretrained Embedding [2/5] (#1248)
  • add Stage.OTHERS and allow TB to print to a seperate prefix not in (TRAIN, TEST, EVAL) (#1258)
  • Add option to skip 2 stage tokenizer and bpe decode sequences in the debug file (#1257)
  • Add Testcase for Wordpiece Tokenizer (#1249)
  • modify accuracy calculation for multi-label classification (#1244)
  • Enable tests in pytext/config:pytext_all_config_test
  • Introduce Class Usage Logging (#1243)
  • Make PyText compatible with Any type (#1242)
  • Make dict_embedding Torchscript friendly (#1240)
  • Support MultipleData for export and kd generation
  • delete flaky/broken tests (#1238)
  • Add support for returning start & end indices.