1486

submitted 2 years ago by nifty@lemmy.world to c/microblogmemes@lemmy.world

198 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] MrConfusion@lemmy.world 11 points 2 years ago

Well, this is simply incorrect. And confidently incorrect at that.

Vision transformers (ViT) is an important branch of computer vision models that apply transformers to image analysis and detection tasks. They perform very well. The main idea is the same, by tokenizing the input image into smaller chunks you can apply the same attention mechanism as in NLP transformer models.

ViT models were introduced in 2020 by Dosovitsky et. al, in the hallmark paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (https://arxiv.org/abs/2010.11929). A work that has received almost 30000 academic citations since its publication.

So claiming transformers only improve natural language and vision output is straight up wrong. It is also widely used in visual analysis including classification and detection.

[-] Barbarian@sh.itjust.works 1 points 2 years ago

Thank you for the correction. So hypothetically, with millions of hours of GoPro footage from the scuttle crew, and if we had some futuristic supercomputer that could crunch live data from a standard definition camera and output decisions, we could hook that up to a Boston dynamics style robot and run one replaced member of the crew?

this post was submitted on 26 Feb 2024

1486 points (94.8% liked)

Microblog Memes

11840 readers

1803 users here now

A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.

Created as an evolution of White People Twitter and other tweet-capture subreddits.

RULES:

Your post must be a screen capture of a microblog-type post that includes the UI of the site it came from, preferably also including the avatar and username of the original poster. Including relevant comments made to the original post is encouraged.
Your post, included comments, or your title/comment should include some kind of commentary or remark on the subject of the screen capture. Your title must include at least one word relevant to your post.
You are encouraged to provide a link back to the source of your screen capture in the body of your post.
Current politics and news are allowed, but discouraged. There MUST be some kind of human commentary/reaction included (either by the original poster or you). Just news articles or headlines will be deleted.
Doctored posts/images and AI are allowed, but discouraged. You MUST indicate this in your post (even if you didn't originally know). If an image is found to be fabricated or edited in any way and it is not properly labeled, it will be deleted.
Absolutely no NSFL content.
Be nice. Don't take anything personally. Take political debates to the appropriate communities. Take personal disagreements & arguments to private messages.
No advertising, brand promotion, or guerrilla marketing.

RELATED COMMUNITIES:

founded 3 years ago

MODERATORS

ReadyUser31@lemmy.world

aeronmelon@lemmy.world

needanke@feddit.org