In this brief post, I discuss some of the trends of ML and list some of the notable recent works. The way we train SotA models is slightly different from a few years ago for the purpose of optimizing the performance: We would first build a massive (often multimodal) dataset crawled from Web and model-parallelize... Continue Reading →
In this brief post, I describe a very coarse learning roadmap of ML within the range of what you can learn from lectures. Once you are beyond this level, you may want to move on to my sequel to this blog post: Current Landscape of Machine Learning, which describes which papers and external sources you... Continue Reading →
Summary: We have released GPT-J-6B, 6B JAX-based (Mesh) Transformer LM (Github).GPT-J-6B performs nearly on par with 6.7B GPT-3 (or Curie) on various zero-shot down-streaming tasks.You can try out this Colab notebook or free web demo.This library also serves as an example of model parallelism with xmap on JAX. Below, we will refer to GPT-J-6B by... Continue Reading →
I have aggregated some of the SotA image generative models released recently, with short summaries, visualizations and comments. The overall development is summarized, and the future trends are speculated. Many of the statements and the results here are easily applicable to other non-textual modalities, such as audio and video.