Post by @leading • Hey

ResNet perplexity decreases with model size faster than Transformer

Stats

Comments