Command-line computer vision.

Linux for blind general discussion blinux-list at redhat.com
Wed Jun 2 16:40:14 UTC 2021


Hey there,

well, it depends on how much you like to play around with things.


As for local options, there is the im2txt model:

https://github.com/HughKu/Im2txt


It's quite interesting, though the code is quite out of date (still for
tensor flow 1.x, the modern version is 2.x, which is not backward
compatible)

Thus some fiddling will be necessary to get it working.


Google's Inception is also an option:

https://github.com/tensorflow/models


This repository currently doesn't contain it, but it definitely did in
the past, so if you can work with git, you can jump few years back to
find it.

I'm not sure if they updated it for TF 2 or not, so again some playing
may be necessary to get the right environment.


There is in fact a newer version of Inception, v4, but I did not test
thatone myself and don't know if there are some simple to use
applications for their usage.


Also, as for the difference between Im2txt and Inception, Im2txt
describes whole scenes like the exampleone - A man surfing a wave, while
Inception just recognizes objects (man, surf, wave perhaps), and gives
you information on how sure it is the item is there.


There are also online systems, if you don't mind sharing your photos
with third parties.

Probably the bestone I've seen so far is Cloudsight:

https://cloudsight.ai/

The same service used for Taptapsee or Camfind.


Their descriptions are very accurate, they're using machine learning
combined with human oversight, so even though the recognition used to
take about 15 - 20 seconds (I don't know about the current state), the
results were usually worth it.


Now, they have even video recognition and offline objects detection,
though I don't know how accurate are those.


Cloudsight has a public api, which even provides free recognition upto
some point. You can try it on their website and could perhaps make a
script for it, so you can access the service easily from command line.
May be there are already some on GitHub, that might be worth checking as
well if you're not into programming.


Best regards


Rastislav


Dňa 26. 5. 2021 o 23:47 Linux for blind general discussion napísal(a):
> Okay, I'm aware of Tesseract and cuneiform for doing OCR on image
> files, but I was wondering if anyone on this list knew of any
> command-line utilities that might be able to tell me useful things
> about the contents of images that contain no text. Even something as
> simple as printing the image's palette in descending order of
> abundance or recognition of basic geometric shapes would be useful I
> think.
>
> My primary use case is giving meaningful filenames to digital photos
> where I know what photos are in the set, but not which photo is which,
> and primarily, the photos are of crafts I've made and taken with the
> camera my portable mediaplayer/talking eReader uses for OCRing print
> documents(the device gives the photos very long, numeric filenames
> that might be timestamps, but even that isn't of much use if I take
> more than one photo in a round of blind photography and transferring
> photos to my Desktop, especially since the device's clock resets to
> midnight the moringing of January 1, 2014 whenever the battery is
> pulled out).
>
> I've tried googling and searching the package lists in Aptitude, but
> all I've managed to find are libaries for writing computer vision code
> into reobotics projects or cloud-based complex object AI stuff.
>
> _______________________________________________
> Blinux-list mailing list
> Blinux-list at redhat.com
> https://listman.redhat.com/mailman/listinfo/blinux-list
>





More information about the Blinux-list mailing list