In Canadian author Margaret Atwood’s book "Blind Assassins," she says that “touch comes before sight, before speech. It’s the first language and the last, and it always tells the truth.”
While our sense of touch gives us a channel to feel the physical world, our eyes help us immediately understand the full picture of these tactile signals.
Robots that have been programmed to see or feel can’t use these signals quite as interchangeably. To better bridge this sensory gap, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have come up with a predictive artificial intelligence (AI) that can learn to see by touching, and learn to feel by seeing.
The team’s system can create realistic tactile signals from visual inputs, and predict which object and what part is being touched directly from those tactile inputs. They used a KUKA robot arm with a special tactile sensor called GelSight, designed by another group at MIT.
Using a simple web camera, the team recorded nearly 200 objects, such as tools, household products, fabrics, and more, being touched more than 12,000 times. Breaking those 12,000 video clips down into static frames, the team compiled “VisGel,” a dataset of more than 3 million visual/tactile-paired images.
“By looking at the scene, our model can imagine the feeling of touching a flat surface or a sharp edge”, says Yunzhu Li, CSAIL PhD student and lead author on a new paper about the system. “By blindly touching around, our model can predict the interaction with the environment purely from tactile feelings. Bringing these two senses together could empower the robot and reduce the data we might need for tasks involving manipulating and grasping objects.”
The team’s technique gets around this by using the VisGel dataset, and something called generative adversarial networks (GANs).
GANs use visual or tactile images to generate images in the other modality. They work by using a “generator” and a “discriminator” that compete with each other, where the generator aims to create real-looking images to fool the discriminator. Every time the discriminator “catches” the generator, it has to expose the internal reasoning for the decision, which allows the generator to repeatedly improve itself.
Vision to touch
Humans can infer how an object feels just by seeing it. To better give machines this power, the system first had to locate the position of the touch, and then deduce information about the shape and feel of the region.
The reference images — without any robot-object interaction — helped the system encode details about the objects and the environment. Then, when the robot arm was operating, the model could simply compare the current frame with its reference image, and easily identify the location and scale of the touch.
This might look something like feeding the system an image of a computer mouse, and then “seeing” the area where the model predicts the object should be touched for pickup — which could vastly help machines plan safer and more efficient actions.
Touch to vision
For touch to vision, the aim was for the model to produce a visual image based on tactile data. The model analyzed a tactile image, and then figured out the shape and material of the contact position. It then looked back to the reference image to “hallucinate” the interaction.
For example, if during testing the model was fed tactile data on a shoe, it could produce an image of where that shoe was most likely to be touched.
This type of ability could be helpful for accomplishing tasks in cases where there’s no visual data, like when a light is off, or if a person is blindly reaching into a box or unknown area.
The current dataset only has examples of interactions in a controlled environment. The team hopes to improve this by collecting data in more unstructured areas, or by using a new MIT-designed tactile glove, to better increase the size and diversity of the dataset.
There are still details that can be tricky to infer from switching modes, like telling the color of an object by just touching it, or telling how soft a sofa is without actually pressing on it. The researchers say this could be improved by creating more robust models for uncertainty, to expand the distribution of possible outcomes.
In the future, this type of model could help with a more harmonious relationship between vision and robotics, especially for object recognition, grasping, better scene understanding, and helping with seamless human-robot integration in an assistive or manufacturing setting.
“This is the first method that can convincingly translate between visual and touch signals”, says Andrew Owens, a postdoc at the University of California at Berkeley. “Methods like this have the potential to be very useful for robotics, where you need to answer questions like ‘is this object hard or soft?’, or ‘if I lift this mug by its handle, how good will my grip be?’ This is a very challenging problem, since the signals are so different, and this model has demonstrated great capability.”
Li wrote the paper alongside MIT professors Russ Tedrake and Antonio Torralba, and MIT postdoc Jun-Yan Zhu. It will be presented next week at The Conference on Computer Vision and Pattern Recognition in Long Beach, California.
Learning to code involves recognizing how to structure a program, and how to fill in every last detail correctly. No wonder it can be so frustrating.
A new program-writing AI, SketchAdapt, offers a way out. Trained on tens of thousands of program examples, SketchAdapt learns how to compose short, high-level programs, while letting a second set of algorithms find the right sub-programs to fill in the details. Unlike similar approaches for automated program-writing, SketchAdapt knows when to switch from statistical pattern-matching to a less efficient, but more versatile, symbolic reasoning mode to fill in the gaps.
“Neural nets are pretty good at getting the structure right, but not the details,” says Armando Solar-Lezama, a professor at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). “By dividing up the labor — letting the neural nets handle the high-level structure, and using a search strategy to fill in the blanks — we can write efficient programs that give the right answer.”
Program synthesis, or teaching computers to code, has long been a goal of AI researchers. A computer that can program itself is more likely to learn language faster, converse fluently, and even model human cognition. All of this drew Solar-Lezama to the field as a graduate student, where he laid the foundation for SketchAdapt.
Solar-Lezama’s early work, Sketch, is based on the idea that a program’s low-level details could be found mechanically if a high-level structure is provided. Among other applications, Sketch inspired spinoffs to automatically grade programming homework and convert hand-drawn diagrams into code. Later, as neural networks grew in popularity, students from Tenenbaum’s computational cognitive science lab suggested a collaboration, out of which SketchAdapt formed.
Rather than rely on experts to define program structure, SketchAdapt figures it out using deep learning. The researchers also added a twist: When the neural networks are unsure of what code to place where, SketchAdapt is programmed to leave the spot blank for search algorithms to fill.
“The system decides for itself what it knows and doesn’t know,” says the study’s lead author, Maxwell Nye, a graduate student in MIT’s Department of Brain and Cognitive Sciences. “When it gets stuck, and has no familiar patterns to draw on, it leaves placeholders in the code. It then uses a guess-and-check strategy to fill the holes.”
The researchers compared SketchAdapt’s performance to programs modeled after Microsoft’s proprietary RobustFill and DeepCoder software, successors to Excel’s FlashFill feature, which analyzes adjacent cells to offer suggestions as you type — learning to transform a column of names into a column of corresponding email addresses, for example. RobustFill uses deep learning to write high-level programs from examples, while DeepCoder specializes in finding and filling in low-level details.
The researchers found that SketchAdapt outperformed their reimplemented versions of RobustFill and DeepCoder at their respective specialized tasks. SketchAdapt outperformed the RobustFill-like program at string transformations; for example, writing a program to abbreviate Social Security numbers as three digits, and first names by their first letter. SketchAdapt also did better than the DeepCoder-like program at writing programs to transform a list of numbers. Trained only on examples of three-line list-processing programs, SketchAdapt was better able to transfer its knowledge to a new scenario and write correct four-line programs.
In yet another task, SketchAdapt outperformed both programs at converting math problems from English to code, and calculating the answer.
Key to its success is the ability to switch from neural pattern-matching to a rules-based symbolic search, says Rishabh Singh, a former graduate student of Solar-Lezama’s, now a researcher at Google Brain. “SketchAdapt learns how much pattern recognition is needed to write familiar parts of the program, and how much symbolic reasoning is needed to fill in details which may involve new or complicated concepts.”
SketchAdapt is limited to writing very short programs. Anything more requires too much computation. Nonetheless, it’s intended more to complement programmers rather than replace them, the researchers say. “Our focus is on giving programming tools to people who want them,” says Nye. “They can tell the computer what they want to do, and the computer can write the program.”
Programming, after all, has always evolved. When Fortran was introduced in the 1950s, it was meant to replace human programmers. “Its full name was Fortran Automatic Coding System, and its goal was to write programs as well as humans, but without the errors,” says Solar-Lezama. “What it really did was automate much of what programmers did before Fortran. It changed the nature of programming.”
The study’s other co-author is Luke Hewitt. Funding was provided by the U.S. Air Force Office of Scientific Research, MIT-IBM Watson AI Lab and U.S. National Science Foundation.
In 2018, researchers at MIT and the auto manufacturer BMW were testing ways in which humans and robots might work in close proximity to assemble car parts. In a replica of a factory floor setting, the team rigged up a robot on rails, designed to deliver parts between work stations. Meanwhile, human workers crossed its path every so often to work at nearby stations.
The robot was programmed to stop momentarily if a person passed by. But the researchers noticed that the robot would often freeze in place, overly cautious, long before a person had crossed its path. If this took place in a real manufacturing setting, such unnecessary pauses could accumulate into significant inefficiencies.
The team traced the problem to a limitation in the robot’s trajectory alignment algorithms used by the robot’s motion predicting software. While they could reasonably predict where a person was headed, due to the poor time alignment the algorithms couldn’t anticipate how long that person spent at any point along their predicted path — and in this case, how long it would take for a person to stop, then double back and cross the robot’s path again.
Now, members of that same MIT team have come up with a solution: an algorithm that accurately aligns partial trajectories in real-time, allowing motion predictors to accurately anticipate the timing of a person’s motion. When they applied the new algorithm to the BMW factory floor experiments, they found that, instead of freezing in place, the robot simply rolled on and was safely out of the way by the time the person walked by again.
“This algorithm builds in components that help a robot understand and monitor stops and overlaps in movement, which are a core part of human motion,” says Julie Shah, associate professor of aeronautics and astronautics at MIT. “This technique is one of the many way we’re working on robots better understanding people.”
Shah and her colleagues, including project lead and graduate student Przemyslaw “Pem” Lasota, will present their results this month at the Robotics: Science and Systems conference in Germany.
To enable robots to predict human movements, researchers typically borrow algorithms from music and speech processing. These algorithms are designed to align two complete time series, or sets of related data, such as an audio track of a musical performance and a scrolling video of that piece’s musical notation.
Researchers have used similar alignment algorithms to sync up real-time and previously recorded measurements of human motion, to predict where a person will be, say, five seconds from now. But unlike music or speech, human motion can be messy and highly variable. Even for repetitive movements, such as reaching across a table to screw in a bolt, one person may move slightly differently each time.
Existing algorithms typically take in streaming motion data, in the form of dots representing the position of a person over time, and compare the trajectory of those dots to a library of common trajectories for the given scenario. An algorithm maps a trajectory in terms of the relative distance between dots.
But Lasota says algorithms that predict trajectories based on distance alone can get easily confused in certain common situations, such as temporary stops, in which a person pauses before continuing on their path. While paused, dots representing the person’s position can bunch up in the same spot.
“When you look at the data, you have a whole bunch of points clustered together when a person is stopped,” Lasota says. “If you’re only looking at the distance between points as your alignment metric, that can be confusing, because they’re all close together, and you don’t have a good idea of which point you have to align to.”
The same goes with overlapping trajectories — instances when a person moves back and forth along a similar path. Lasota says that while a person’s current position may line up with a dot on a reference trajectory, existing algorithms can’t differentiate between whether that position is part of a trajectory heading away, or coming back along the same path.
“You may have points close together in terms of distance, but in terms of time, a person’s position may actually be far from a reference point,” Lasota says.
It’s all in the timing
As a solution, Lasota and Shah devised a “partial trajectory” algorithm that aligns segments of a person’s trajectory in real-time with a library of previously collected reference trajectories. Importantly, the new algorithm aligns trajectories in both distance and timing, and in so doing, is able to accurately anticipate stops and overlaps in a person’s path.
“Say you’ve executed this much of a motion,” Lasota explains. “Old techniques will say, ‘this is the closest point on this representative trajectory for that motion.’ But since you only completed this much of it in a short amount of time, the timing part of the algorithm will say, ‘based on the timing, it’s unlikely that you’re already on your way back, because you just started your motion.’”
The team tested the algorithm on two human motion datasets: one in which a person intermittently crossed a robot’s path in a factory setting (these data were obtained from the team’s experiments with BMW), and another in which the group previously recorded hand movements of participants reaching across a table to install a bolt that a robot would then secure by brushing sealant on the bolt.
For both datasets, the team’s algorithm was able to make better estimates of a person’s progress through a trajectory, compared with two commonly used partial trajectory alignment algorithms. Furthermore, the team found that when they integrated the alignment algorithm with their motion predictors, the robot could more accurately anticipate the timing of a person’s motion. In the factory floor scenario, for example, they found the robot was less prone to freezing in place, and instead smoothly resumed its task shortly after a person crossed its path.
While the algorithm was evaluated in the context of motion prediction, it can also be used as a preprocessing step for other techniques in the field of human-robot interaction, such as action recognition and gesture detection. Shah says the algorithm will be a key tool in enabling robots to recognize and respond to patterns of human movements and behaviors. Ultimately, this can help humans and robots work together in structured environments, such as factory settings and even, in some cases, the home.
“This technique could apply to any environment where humans exhibit typical patterns of behavior,” Shah says. “The key is that the [robotic] system can observe patterns that occur over and over, so that it can learn something about human behavior. This is all in the vein of work of the robot better understand aspects of human motion, to be able to collaborate with us better.”
This research was funded, in part, by a NASA Space Technology Research Fellowship and the National Science Foundation.
A broad class of materials called perovskites is considered one of the most promising avenues for developing new, more efficient solar cells. But the virtually limitless number of possible combinations of these materials’ constituent elements makes the search for promising new perovskites slow and painstaking.
Now, a team of researchers at MIT and several other institutions has accelerated the process of screening new formulations, achieving a roughly ten-fold improvement in the speed of the synthesis and analysis of new compounds. In the process, they have already discovered two sets of promising new perovskite-inspired materials that are worthy of further study.
Their findings are described this week in the journal Joule, in a paper by MIT research scientist Shijing Sun, professor of mechanical engineering Tonio Buonassisi, and 16 others at MIT, in Singapore, and at the National Institute of Standards and Technology in Maryland.
Somewhat surprisingly, although partial automation was employed, most of the improvements in throughput speed resulted from workflow ergonomics, says Buonassisi. That involves more traditional systems efficiencies, often derived by tracking and timing the many steps involved: synthesizing new compounds, depositing them on a substrate to crystallize, and then observing and classifying the resulting crystal formations using multiple techniques.
“There’s a need for accelerated development of new materials,” says Buonassisi, as the world continues to move toward solar energy, including in regions with limited space for solar panels. But the typical system for developing new energy-conversion materials can take 20 years, with significant upfront capital costs, he says. His team’s aim is to cut that development time to under two years.
Essentially, the researchers developed a system that allows a wide variety of materials to be made and tested in parallel. “We’re now able to access a large range of different compositions, using the same materials synthesis platform. It allows us to explore a vast range of parameter space,” he says.
Perovskite compounds consist of three separate constituents, traditionally labeled as A, B, and X site ions, each of which can be any one of a list of candidate elements, forming a very large structural family with diverse physical properties. In the field of perovskite and perovskite-inspired materials for photovoltaic applications, the B-site ion is typically lead, but a major effort in perovskite research is to find viable lead-free versions that can match or exceed the performance of the lead-based varieties.
While more than a thousand potentially useful perovskite formulations have been predicted theoretically, out of millions of theoretically possible combinations, only a small fraction of those has been produced experimentally so far, highlighting the need for an accelerated process, the researchers say.
For the experiments, the team selected a variety of different compositions, each of which they mixed in a solution and then deposited on a substrate, where the material crystallized into a thin film. The film was then examined using a technique called X-ray diffraction, which can reveal details of how the atoms are arranged in the crystal structure. These X-ray diffraction patterns were then initially classified with the help of a convolutional neural network system to speed up that part of the process. That classification step alone, Buonassisi says, initially took three to five hours, but by applying machine learning, this was slashed to 5.5 minutes while maintaining 90 percent accuracy.
Already, in their initial testing of the system, the team explored 75 different formulations in about a tenth of the time it previously would have taken to synthesize and characterize that many. Among those 75, they found two new lead-free perovskite systems that exhibit promising properties that might have potential for high-efficiency solar cells.
In the process, they produced four compounds in thin-film form for the first time; thin films are the desirable form for use in solar cells. They also found examples of “nonlinear bandgap tunability” in some of the materials, an unexpected characteristic that relates to the energy level needed to excite an electron in the material, which they say opens up new pathways for potential solar cells.
The team says that with further automation of parts of the process, it should be possible to continue to increase the processing speed, making it anywhere from 10 to 100 times as fast. Ultimately, Buonassisi says, it’s all about getting solar power to be as inexpensive as possible, continuing the technology’s already remarkable plunge. The aim is to bring economically sustainable prices below 2 cents per kilowatt-hour, he says, and getting there could be the result of a single breakthrough in materials: “All you have to do is make one material” that has just the right combination of properties — including ease of manufacture, low cost of materials, and high efficiency at converting sunlight.
“We’re putting all the experimental pieces in place so we can explore faster,” he says.
The work was supported by Total SA through the MIT Energy Initiative, by the National Science Foundation, and Singapore’s National Research Foundation through the Singapore-MIT Alliance for Research and Technology.
MIT researchers have developed a novel “photonic” chip that uses light instead of electricity — and consumes relatively little power in the process. The chip could be used to process massive neural networks millions of times more efficiently than today’s classical computers do.
Neural networks are machine-learning models that are widely used for such tasks as robotic object identification, natural language processing, drug development, medical imaging, and powering driverless cars. Novel optical neural networks, which use optical phenomena to accelerate computation, can run much faster and more efficiently than their electrical counterparts.
But as traditional and optical neural networks grow more complex, they eat up tons of power. To tackle that issue, researchers and major tech companies — including Google, IBM, and Tesla — have developed “AI accelerators,” specialized chips that improve the speed and efficiency of training and testing neural networks.
For electrical chips, including most AI accelerators, there is a theoretical minimum limit for energy consumption. Recently, MIT researchers have started developing photonic accelerators for optical neural networks. These chips perform orders of magnitude more efficiently, but they rely on some bulky optical components that limit their use to relatively small neural networks.
In a paper published in Physical Review X, MIT researchers describe a new photonic accelerator that uses more compact optical components and optical signal-processing techniques, to drastically reduce both power consumption and chip area. That allows the chip to scale to neural networks several orders of magnitude larger than its counterparts.
Simulated training of neural networks on the MNIST image-classification dataset suggest the accelerator can theoretically process neural networks more than 10 million times below the energy-consumption limit of traditional electrical-based accelerators and about 1,000 times below the limit of photonic accelerators. The researchers are now working on a prototype chip to experimentally prove the results.
“People are looking for technology that can compute beyond the fundamental limits of energy consumption,” says Ryan Hamerly, a postdoc in the Research Laboratory of Electronics. “Photonic accelerators are promising … but our motivation is to build a [photonic accelerator] that can scale up to large neural networks.”
Practical applications for such technologies include reducing energy consumption in data centers. “There’s a growing demand for data centers for running large neural networks, and it’s becoming increasingly computationally intractable as the demand grows,” says co-author Alexander Sludds, a graduate student in the Research Laboratory of Electronics. The aim is “to meet computational demand with neural network hardware … to address the bottleneck of energy consumption and latency.”
Joining Sludds and Hamerly on the paper are: co-author Liane Bernstein, an RLE graduate student; Marin Soljacic, an MIT professor of physics; and Dirk Englund, an MIT associate professor of electrical engineering and computer science, a researcher in RLE, and head of the Quantum Photonics Laboratory.
Neural networks process data through many computational layers containing interconnected nodes, called “neurons,” to find patterns in the data. Neurons receive input from their upstream neighbors and compute an output signal that is sent to neurons further downstream. Each input is also assigned a “weight,” a value based on its relative importance to all other inputs. As the data propagate “deeper” through layers, the network learns progressively more complex information. In the end, an output layer generates a prediction based on the calculations throughout the layers.
All AI accelerators aim to reduce the energy needed to process and move around data during a specific linear algebra step in neural networks, called “matrix multiplication.” There, neurons and weights are encoded into separate tables of rows and columns and then combined to calculate the outputs.
In traditional photonic accelerators, pulsed lasers encoded with information about each neuron in a layer flow into waveguides and through beam splitters. The resulting optical signals are fed into a grid of square optical components, called “Mach-Zehnder interferometers,” which are programmed to perform matrix multiplication. The interferometers, which are encoded with information about each weight, use signal-interference techniques that process the optical signals and weight values to compute an output for each neuron. But there’s a scaling issue: For each neuron there must be one waveguide and, for each weight, there must be one interferometer. Because the number of weights squares with the number of neurons, those interferometers take up a lot of real estate.
“You quickly realize the number of input neurons can never be larger than 100 or so, because you can’t fit that many components on the chip,” Hamerly says. “If your photonic accelerator can’t process more than 100 neurons per layer, then it makes it difficult to implement large neural networks into that architecture.”
The researchers’ chip relies on a more compact, energy efficient “optoelectronic” scheme that encodes data with optical signals, but uses “balanced homodyne detection” for matrix multiplication. That’s a technique that produces a measurable electrical signal after calculating the product of the amplitudes (wave heights) of two optical signals.
Pulses of light encoded with information about the input and output neurons for each neural network layer — which are needed to train the network — flow through a single channel. Separate pulses encoded with information of entire rows of weights in the matrix multiplication table flow through separate channels. Optical signals carrying the neuron and weight data fan out to grid of homodyne photodetectors. The photodetectors use the amplitude of the signals to compute an output value for each neuron. Each detector feeds an electrical output signal for each neuron into a modulator, which converts the signal back into a light pulse. That optical signal becomes the input for the next layer, and so on.
The design requires only one channel per input and output neuron, and only as many homodyne photodetectors as there are neurons, not weights. Because there are always far fewer neurons than weights, this saves significant space, so the chip is able to scale to neural networks with more than a million neurons per layer.
Finding the sweet spot
With photonic accelerators, there’s an unavoidable noise in the signal. The more light that’s fed into the chip, the less noise and greater the accuracy — but that gets to be pretty inefficient. Less input light increases efficiency but negatively impacts the neural network’s performance. But there’s a “sweet spot,” Bernstein says, that uses minimum optical power while maintaining accuracy.
That sweet spot for AI accelerators is measured in how many joules it takes to perform a single operation of multiplying two numbers — such as during matrix multiplication. Right now, traditional accelerators are measured in picojoules, or one-trillionth of a joule. Photonic accelerators measure in attojoules, which is a million times more efficient.
In their simulations, the researchers found their photonic accelerator could operate with sub-attojoule efficiency. “There’s some minimum optical power you can send in, before losing accuracy. The fundamental limit of our chip is a lot lower than traditional accelerators … and lower than other photonic accelerators,” Bernstein says.
The reports summarize the efforts of five working groups which, over the last few months, have been studying ideas and options for the college, including its structure, curriculum, faculty appointment and hiring practices, social responsibilities, and computing infrastructure. The working groups have been informed by a series of community forums; further feedback from the MIT community is now sought in response to the reports.
The Institute announced in October 2018 the creation of the MIT Schwarzman College of Computing, which represents the biggest institutional change to MIT since 1950. MIT is largely structured around five broad-reaching schools that are the Institute’s main sites for undergraduate and graduate education, and research.
In response to the pervasiveness of computing in society and academic inquiry, the MIT Schwarzman College of Computing will serve as a campus-wide “bridge” across disciplines. It will advance research in computing and computer science — especially in artificial intelligence — and enhance our understanding of the social and ethical implications of technology.
Working on solutions
The working groups consist of over 100 MIT faculty, students, and staff, and have been in operation since February, with the help of community input and a campus-wide Idea Bank. The groups each submitted separate reports last week.
The working group co-chairs are also part of a steering committee which is helping guide the formation of the new college and has convened frequently in recent months to examine overlapping areas of interest among the groups. Steering committee members also include MIT Provost Martin A. Schmidt, Dean of Engineering Anantha Chandrakasan, and Faculty Chair Susan Silbey.
“I wish to express my deep appreciation to the Steering Committee and to all of the members of the working groups for their dedicated work during the last several months, especially knowing that they had a great deal of territory to cover during a relatively short span of time,” said Schmidt in an email sent to the MIT community today. “We are extremely grateful for their efforts.”
Each working group evaluated multiple, often overlapping ideas about the Schwarzman College of Computing. These working group reports do not represent a series of final decisions about the college; rather, they detail important organizational options, often weighing pros and cons of particular ideas.
The Working Group on Organizational Structure was chaired by Asu Ozdaglar, head of the Department of Electrical Engineering and Computer Science (EECS) and the School of Engineering Distinguished Professor of Engineering, and Nelson Repenning, associate dean of leadership and special projects and the Sloan School of Management Distinguished Professor of System Dynamics and Organization Studies.
The group evaluated the best organizational structure for the MIT Schwarzman College of Computing in light of the existing strengths of computing research in EECS and the overall needs of MIT’s five schools: the School of Engineering; the School of Science; the School of Humanities, Arts, and Social Sciences; the School of Architecture and Planning; and the Sloan School of Management.
The working group discussed a structure in which all five schools work to create interdisciplinary core course offerings in the new college. Another key issue the group has been examining is the relationship between the college and EECS. Additionally, the group outlined several ways that faculty can be affiliated with the college while continuing as members of their own departments and programs.
The Faculty Appointments Working Group was co-chaired by Eran Ben-Joseph, head of the Department of Urban Studies and Planning, and William Freeman, the Thomas and Gerd Perkins Professor of Electrical Engineering.
The group examined options concerning four related topics: types of faculty appointments, hiring models, faculty rights and responsibilities, and faculty mentoring handbooks. Many faculty hires could be joint appointments, the group proposed, with teaching and research in both the new college and existing departments; the college’s hiring process could also allow for a significant portion of new faculty to have this kind of multidisciplinary status.
If this approach is followed, the working group suggested, joint-faculty roles, rights and obligations need to be well-defined — including research expectations and teaching commitments — and guidelines for faculty mentoring should be established in advance.
The Working Group on Curriculum and Degrees was co-chaired by Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering and Computer Science, and Troy Van Voorhis, the Haslam and Dewey Professor of Chemistry.
Proposals from this group include ways to encourage more undergraduates to complete the flexible computer science minor or to pursue “threads” — sets of coursework similar to minors — enhancing computing studies within their own majors. MIT might continue to expand joint degrees or even more-encompassing double majors, and might consider establishing a General Institute Requirement in computing. The group also examined graduate education and developed ideas about graduate degrees and certificates in computation, as well as the expansion of joint graduate degrees that include computing. The group also outlined a variety of ways new curriculum development may occur.
Broadly, the working group examined how best to incorporate social and ethical considerations into the college’s fabric — including education, research, and external engagement. On the education front, the group examined how that stand-alone classes about ethics and social responsibility could be woven into the college curriculum. They also evaluated how smaller educational units about social issues could be incorporated within other classes. The group also proposed new ideas about including an ethics dimension in research and extracurricular learning — such as leveraging MIT’s UROP program or mentored projects to provide a strong grounding in ethics-focused work.
The Working Group on College Infrastructure was co-chaired by Benoit Forget, an associate professor in the Department of Nuclear Science and Engineering, and Nicholas Roy, a professor in the Department of Aeronautics and Astronautics and a member of CSAIL.
This working group took particularly in-depth look at MIT’s future needs in the area of computing infrastructure. The group suggested that MIT’s future computing infrastructure is unlikely to be optimized around a single model of computing access, given the diversity of research projects and needs on campus. In general, the group suggested that support for a renewed computing infrastructure and improved data management should be a high priority for the college, and might include expanded student training and increased professional staffing in computing.
The way forward
Members of the MIT community are encouraged to examine the latest reports and offer input about the MIT Schwarzman College of Computing.
“I invite you to review these preliminary reports and provide us with your feedback, Schmidt said in his letter to the community, adding: “I look forward to further opportunities for community involvement in the early phases and continuing development of our new college.”
He noted that community input will be collected until June 28, after which the final reports will be posted.
The official launch of the MIT Schwarzman College of Computing will occur this fall, with the full development of the college occurring over a period of several years. MIT aims to add 50 full-time faculty to the college and jointly with departments across MIT over a five-year period. The Institute has also identified the location for a new building for the college, on the site of 44 Vassar Street, between Massachusetts Avenue and Main Street, and aims to open the new facility by late 2022.
In February, MIT announced the appointment of Dan Huttenlocher SM ’84 PhD ’88 as the first dean of the college. Huttenlocher will begin the new post this summer.
The MIT Schwarzman College of Computing is being supported by a $1 billion commitment for new research and education in computing, the biggest investment of its kind by a U.S. academic institution. The core support for the new college comes from a $350 million foundational gift from Stephen A. Schwarzman, the chairman, CEO, and co-founder of Blackstone, the global asset management and financial services firm.
Objects made with 3-D printing can be lighter, stronger, and more complex than those produced through traditional manufacturing methods. But several technical challenges must be overcome before 3-D printing transforms the production of most devices.
Commercially available printers generally offer only high speed, high precision, or high-quality materials. Rarely do they offer all three, limiting their usefulness as a manufacturing tool. Today, 3-D printing is used mainly for prototyping and low-volume production of specialized parts.
Now Inkbit, a startup out of MIT, is working to bring all of the benefits of 3-D printing to a slew of products that have never been printed before — and it’s aiming to do so at volumes that would radically disrupt production processes in a variety of industries.
The company is accomplishing this by pairing its multimaterial inkjet 3-D printer with machine-vision and machine-learning systems. The vision system comprehensively scans each layer of the object as it’s being printed to correct errors in real-time, while the machine-learning system uses that information to predict the warping behavior of materials and make more accurate final products.
“The company was born out of the idea of endowing a 3-D printer with eyes and brains,” says Inkbit co-founder and CEO Davide Marini PhD ’03.
That idea unlocks a range of applications for Inkbit’s machine. The company says it can print more flexible materials much more accurately than other printers. If an object, including a computer chip or other electronic component, is placed on the print area, the machine can precisely print materials around it. And when an object is complete, the machine keeps a digital replica that can be used for quality assurance.
Inkbit is still an early-stage company. It currently has one operational production-grade printer. But it will begin selling printed products later this year, starting with a pilot with Johnson and Johnson, before selling its printers next year. If Inkbit can leverage current interest from companies that sell medical devices, consumer products, and automotive components, its machines will be playing a leading production role in a host of multi-billion-dollar markets in the next few years, from dental aligners to industrial tooling and sleep apnea masks.
“Everyone knows the advantages of 3-D printing are enormous,” Marini says. “But most people are experiencing problems adopting it. The technology just isn’t there yet. Our machine is the first one that can learn the properties of a material and predict its behavior. I believe it will be transformative, because it will enable anyone to go from an idea to a usable product extremely quickly. It opens up business opportunities for everyone.”
A printer with potential
Some of the hardest materials to print today are also the most commonly used in current manufacturing processes. That includes rubber-like materials such as silicone, and high-temperature materials such as epoxy, which are often used for insulating electronics and in a variety of consumer, health, and industrial products.
These materials are usually difficult to print, leading to uneven distribution and print process failures like clogging. They also tend to shrink or round at the edges over time. Inkbit co-founders Wojciech Matusik, an associate professor of electrical engineering and computer science, Javier Ramos BS ’12 SM ’14, Wenshou Wang, and Kiril Vidimče SM ’14 have been working on these problems for years in Matusik’s Computational Fabrications Group within the Computer Science and Artificial Intelligence Laboratory (CSAIL).
In 2015, the co-founders were among a group of researchers that created a relatively low-cost, precise 3-D printer that could print a record 10 materials at once by leveraging machine vision. The feat got the attention of many large companies interested in transitioning production to 3-D printing, and the following year the four engineers received support from the Deshpande Center to commercialize their idea of joining machine vision with 3-D printing.
At MIT, Matusik’s research group used a simple 3-D scanner to track its machine’s progress. For Inkbit’s first printer, the founders wanted to dramatically improve “the eyes” of their machine. They decided to use an optical coherence tomography (OCT) scanner, which uses long wavelengths of light to see through the surface of materials and scan layers of material at a resolution the fraction of the width of a human hair.
Because OCT scanners are traditionally only used by ophthalmologists to examine below the surface of patients’ eyes, the only ones available were far too slow to scan each layer of a 3-D printed part — so Inkbit’s team “bit the bullet,” as Marini describes it, and built a custom OCT scanner he says is 100 times faster than anything else on the market today.
When a layer is printed and scanned, the company’s proprietary machine-vision and machine-learning systems automatically correct any errors in real-time and proactively compensate for the warping and shrinkage behavior of a fickle material. Those processes further expand the range of materials the company is able to print with by removing the rollers and scrapers used by some other printers to ensure precision, which tend to jam when used with difficult-to-print materials.
The system is designed to allow users to prototype and manufacture new objects on the same machine. Inkbit’s current industrial printer has 16 print heads to create multimaterial parts and a print block big enough to produce hundreds of thousands of fist-sized products each year (or smaller numbers of larger products). The machine’s contactless inkjet design means increasing the size of later iterations will be as simple as expanding the print block.
“Before, people could make prototypes with multimaterial printers, but they couldn’t really manufacture final parts,” Matusik says, noting that the postprocessing of Inkbit’s parts can be fully automated. “This is something that’s not possible using any other manufacturing methods.”
Inkbit's 3-D printer can produce multimaterial objects (like the pinch valve shown above) at high volumes. Courtesy of Inkbit
The novel capabilities of Inkbit’s machine mean that some of the materials the founders want to print with are not available, so the company has created some of its own chemistries to push the performance of their products to the limit. A proprietary system for mixing two materials just before printing will be available on the printers Inkbit ships next year. The two-part chemistry mixing system will allow the company to print a broader range of engineering-grade materials.
Johnson and Johnson, a strategic partner of Inkbit, is in the process of acquiring one of the first printers. The MIT Startup Exchange Accelerator (STEX25) has also been instrumental in exposing Inkbit to leading corporations such as Amgen, Asics, BAE Systems, Bosch, Chanel, Lockheed Martin, Medtronic, Novartis, and others.
Today, the founders spend a lot of their time educating product design teams that have never been able to 3-D print their products before — let alone incorporate electronic components into 3-D-printed parts.
It may be a while before designers and inventors take full advantage of the possibilities unlocked by integrated, multimaterial 3-D printing. But for now, Inkbit is working to ensure that, when that future comes, the most imaginative people will have a machine to work with.
“Some of this is so far ahead of its time,” Matusik says. “I think it will be really fascinating to see how people are going to use it for final products.”
Researchers from MIT and elsewhere have developed an interactive tool that, for the first time, lets users see and control how automated machine-learning systems work. The aim is to build confidence in these systems and find ways to improve them.
Designing a machine-learning model for a certain task — such as image classification, disease diagnoses, and stock market prediction — is an arduous, time-consuming process. Experts first choose from among many different algorithms to build the model around. Then, they manually tweak “hyperparameters” — which determine the model’s overall structure — before the model starts training.
Recently developed automated machine-learning (AutoML) systems iteratively test and modify algorithms and those hyperparameters, and select the best-suited models. But the systems operate as “black boxes,” meaning their selection techniques are hidden from users. Therefore, users may not trust the results and can find it difficult to tailor the systems to their search needs.
In a paper presented at the ACM CHI Conference on Human Factors in Computing Systems, researchers from MIT, the Hong Kong University of Science and Technology (HKUST), and Zhejiang University describe a tool that puts the analyses and control of AutoML methods into users’ hands. Called ATMSeer, the tool takes as input an AutoML system, a dataset, and some information about a user’s task. Then, it visualizes the search process in a user-friendly interface, which presents in-depth information on the models’ performance.
“We let users pick and see how the AutoML systems works,” says co-author Kalyan Veeramachaneni, a principal research scientist in the MIT Laboratory for Information and Decision Systems (LIDS), who leads the Data to AI group. “You might simply choose the top-performing model, or you might have other considerations or use domain expertise to guide the system to search for some models over others.”
In case studies with science graduate students, who were AutoML novices, the researchers found about 85 percent of participants who used ATMSeer were confident in the models selected by the system. Nearly all participants said using the tool made them comfortable enough to use AutoML systems in the future.
“We found people were more likely to use AutoML as a result of opening up that black box and seeing and controlling how the system operates,” says Micah Smith, a graduate student in the Department of Electrical Engineering and Computer Science (EECS) and a researcher in LIDS.
“Data visualization is an effective approach toward better collaboration between humans and machines. ATMSeer exemplifies this idea,” says lead author Qianwen Wang of HKUST. “ATMSeer will mostly benefit machine-learning practitioners, regardless of their domain, [who] have a certain level of expertise. It can relieve the pain of manually selecting machine-learning algorithms and tuning hyperparameters.”
Joining Smith, Veeramachaneni, and Wang on the paper are: Yao Ming, Qiaomu Shen, Dongyu Liu, and Huamin Qu, all of HKUST; and Zhihua Jin of Zhejiang University.
Tuning the model
At the core of the new tool is a custom AutoML system, called “Auto-Tuned Models” (ATM), developed by Veeramachaneni and other researchers in 2017. Unlike traditional AutoML systems, ATM fully catalogues all search results as it tries to fit models to data.
ATM takes as input any dataset and an encoded prediction task. The system randomly selects an algorithm class — such as neural networks, decision trees, random forest, and logistic regression — and the model’s hyperparameters, such as the size of a decision tree or the number of neural network layers.
Then, the system runs the model against the dataset, iteratively tunes the hyperparameters, and measures performance. It uses what it has learned about that model’s performance to select another model, and so on. In the end, the system outputs several top-performing models for a task.
The trick is that each model can essentially be treated as one data point with a few variables: algorithm, hyperparameters, and performance. Building on that work, the researchers designed a system that plots the data points and variables on designated graphs and charts. From there, they developed a separate technique that also lets them reconfigure that data in real time. “The trick is that, with these tools, anything you can visualize, you can also modify,” Smith says.
Similar visualization tools are tailored toward analyzing only one specific machine-learning model, and allow limited customization of the search space. “Therefore, they offer limited support for the AutoML process, in which the configurations of many searched models need to be analyzed,” Wang says. “In contrast, ATMSeer supports the analysis of machine-learning models generated with various algorithms.”
User control and confidence
ATMSeer’s interface consists of three parts. A control panel allows users to upload datasets and an AutoML system, and start or pause the search process. Below that is an overview panel that shows basic statistics — such as the number of algorithms and hyperparameters searched — and a “leaderboard” of top-performing models in descending order. “This might be the view you’re most interested in if you’re not an expert diving into the nitty gritty details,” Veeramachaneni says.
Similar visualization tools present this basic information, but without customization capabilities. ATMSeer includes an “AutoML Profiler,” with panels containing in-depth information about the algorithms and hyperparameters, which can all be adjusted. One panel represents all algorithm classes as histograms — a bar chart that shows the distribution of the algorithm’s performance scores, on a scale of 0 to 10, depending on their hyperparameters. A separate panel displays scatter plots that visualize the tradeoffs in performance for different hyperparameters and algorithm classes.
Case studies with machine-learning experts, who had no AutoML experience, revealed that user control does help improve the performance and efficiency of AutoML selection. User studies with 13 graduate students in diverse scientific fields — such as biology and finance — were also revealing. Results indicate three major factors — number of algorithms searched, system runtime, and finding the top-performing model — determined how users customized their AutoML searches. That information can be used to tailor the systems to users, the researchers say.
“We are just starting to see the beginning of the different ways people use these systems and make selections,” Veeramachaneni says. “That’s because now that this information is all in one place, and people can see what’s going on behind the scenes and have the power to control it.”
If you’ve ever wondered what a loaf of bread would look like as a cat, edges2cats is for you. The program that turns sketches into images of cats is one of many whimsical creations inspired by Phillip Isola’s image-to-image translation software released in the early days of generative adversarial networks, or GANs. In a 2016 paper, Isola and his colleagues showed how a new type of GAN could transform a hand-drawn shoe into its fashion-photo equivalent, or turn an aerial photo into a grayscale map. Later, the researchers showed how landscape photos could be reimagined in the impressionist brushstrokes of Monet or Van Gogh. Now an assistant professor in MIT’s Department of Electrical Engineering and Computer Science, Isola continues to explore what GANs can do.
GANs work by pairing two neural networks, trained on a large set of images. One network, the generator, outputs an image patterned after the training examples. The other network, the discriminator, rates how well the generator’s output image resembles the training data. If the discriminator can tell it’s a fake, the generator tries again and again until its output images are indistinguishable from the examples. When Isola first heard of GANs, he was experimenting with nearest-neighbor algorithms to try to infer the underlying structure of objects and scenes.
To connect the growing number of GAN enthusiasts at MIT and beyond, Isola has recently helped to organize GANocracy, a day of talks, tutorials, and posters being held at MIT on May 31 that is co-sponsored by the MIT Quest for Intelligence and MIT-IBM Watson AI Lab. Isola recently spoke about the future of GANs.
Q: Your image-to-image translation paper has more than 2,000 citations. What made it so popular?
A: It was one of the earliest papers to show that GANs are useful for predicting visual data. We showed that this setting is very general, and can be thought of as translating between different visualizations of the world, which we called image-to-image translation. GANs were originally proposed as a model for producing realistic images from scratch. But the most useful application may be structured prediction, which is what GANs are mostly being used for these days.
Q: GANs are easily customized and shared on social media. Any favorites among these projects?
A:#Edges2cats is probably my favorite, and it helped to popularize the framework early on. Architect Nono Martínez Alonso has used pix2pix for exploring interesting tools for sketch-based design. I like everything by Mario Klingemann; Alternative Face is especially thought-provoking. It puts one person’s words into someone else’s mouth, hinting at a potential future of “alternative facts.” Scott Eaton is pushing the limits of GANs by translating sketches into 3-D sculptures.
Q: What other GAN art grabs you?
A: I really like all of it. One remarkable example is GANbreeder. It’s a human-curated evolution of GAN-generated images. The crowd chooses which images to breed or kill off. Over many generations, we end up with beautiful and unexpected images.
Q: How are GANs being used beyond art?
A: In medical imaging, they’re being used to generate CT scans from MRIs. There’s potential there, but it can be easy to misinterpret the results: GANs are making predictions, not revealing the truth. We don't yet have good ways to measure the uncertainty of their predictions. I'm also excited about the use of GANs for simulations. Robots are often trained in simulators to reduce costs, creating complications when we deploy them in the real world. GANs can help bridge the gap between simulation and reality.
Q: Will GANs redefine what it means to be an artist?
A: I don't know, but it's a super-interesting question. Several of our GANocracy speakers are artists, and I hope will touch on this. GANs and other generative models are different than other kinds of algorithmic art. They are trained to imitate, so the people being imitated probably deserve some credit. The art collective, Obvious, recently sold a GAN image at Christie's for $432,500. Obvious selected the image, signed and framed it, but the code was derived from work by then-17-year-old Robbie Barrat. Ian Goodfellow helped develop the underlying algorithm.
Q: Where is the field heading?
A: As amazing as GANs are, they are just one type of generative model. GANs might eventually fade in popularity, but generative models are here to stay. As models of high-dimensional structured data, generative models get close to what we mean when we say “create,” “visualize,” and “imagine.” I think they will be used more and more to approximate capabilities that still seem uniquely human. But GANs do have some unique properties. For one, they solve the generative modeling problem via a two-player competition, creating a generator-discriminator arms race that leads to emergent complexity. Arms races show up across machine learning, including in the AI that achieved superhuman abilities in the game Go.
Q: Are you worried about the potential abuse of GANs?
A: I’m definitely concerned about the use of GANs to generate and spread misleading content, or so-called fake news. GANs make it a lot easier to create doctored photos and videos, where you no longer have to be a video editing expert to make it look like a politician is saying something they never actually said.
Q: You and the other GANocracy organizers are advocating for so-called GANtidotes. Why?
A: We would like to inoculate society against the misuse of GANs. Everyone could just stop trusting what we see online, but then we’d risk losing touch with reality. I’d like to preserve a future in which “seeing is believing.” Luckily, many people are working on technical antidotes that range from detectors that seek out the telltale artifacts in a GAN-manipulated image to cryptographic signatures that verify that a photo has not been edited since it was taken. There are a lot of ideas out there, so I’m optimistic it can be solved.
Voice assistants like Siri and Alexa can tell the weather and crack a good joke, but any 8-year-old can carry on a better conversation.
The deep learning models that power Siri and Alexa learn to understand our commands by picking out patterns in sequences of words and phrases. Their narrow, statistical understanding of language stands in sharp contrast to our own creative, spontaneous ways of speaking, a skill that starts developing even before we are born, while we're still in the womb.
To give computers some of our innate feel for language, researchers have started training deep learning models on the grammatical rules that most of us grasp intuitively, even if we never learned how to diagram a sentence in school. Grammatical constraints seem to help the models learn faster and perform better, but because neural networks reveal very little about their decision-making process, researchers have struggled to confirm that the gains are due to the grammar, and not the models’ expert ability at finding patterns in sequences of words.
Now psycholinguists have stepped in to help. To peer inside the models, researchers have taken psycholinguistic tests originally developed to study human language understanding and adapted them to probe what neural networks know about language. In a pair of papers to be presented in June at the North American Chapter of the Association for Computational Linguistics conference, researchers from MIT, Harvard University, University of California, IBM Research, and Kyoto University have devised a set of tests to tease out the models’ knowledge of specific grammatical rules. They find evidence that grammar-enriched deep learning models comprehend some fairly sophisticated rules, performing better than models trained on little-to-no grammar, and using a fraction of the data.
“Grammar helps the model behave in more human-like ways,” says Miguel Ballesteros, an IBM researcher with the MIT-IBM Watson AI Lab, and co-author of both studies. “The sequential models don’t seem to care if you finish a sentence with a non-grammatical phrase. Why? Because they don’t see that hierarchy.”
As a postdoc at Carnegie Mellon University, Ballesteros helped develop a method for training modern language models on sentence structure called recurrent neural network grammars, or RNNGs. In the current research, he and his colleagues exposed the RNNG model, and similar models with little-to-no grammar training, to sentences with good, bad, or ambiguous syntax. When human subjects are asked to read sentences that sound grammatically off, their surprise is registered by longer response times. For computers, surprise is expressed in probabilities; when low-probability words appear in the place of high-probability words, researchers give the models a higher surprisal score.
They found that the best-performing model — the grammar-enriched RNNG model — showed greater surprisal when exposed to grammatical anomalies; for example, when the word “that” improperly appears instead of “what” to introduce an embedded clause; “I know what the lion devoured at sunrise” is a perfectly natural sentence, but “I know that the lion devoured at sunrise” sounds like it has something missing — because it does.
Linguists call this type of construction a dependency between a filler (a word like who or what) and a gap (the absence of a phrase where one is typically required). Even when more complicated constructions of this type are shown to grammar-enriched models, they — like native speakers of English — clearly know which ones are wrong.
For example, “The policeman who the criminal shot the politician with his gun shocked during the trial” is anomalous; the gap corresponding to the filler “who” should come after the verb, “shot,” not “shocked.” Rewriting the sentence to change the position of the gap, as in “The policeman who the criminal shot with his gun shocked the jury during the trial,” is longwinded, but perfectly grammatical.
“Without being trained on tens of millions of words, state-of-the-art sequential models don’t care where the gaps are and aren’t in sentences like those,” says Roger Levy, a professor in MIT’s Department of Brain and Cognitive Sciences, and co-author of both studies. “A human would find that really weird, and, apparently, so do grammar-enriched models.”
Bad grammar, of course, not only sounds weird, it can turn an entire sentence into gibberish, underscoring the importance of syntax in cognition, and to psycholinguists who study syntax to learn more about the brain’s capacity for symbolic thought.“Getting the structure right is important to understanding the meaning of the sentence and how to interpret it,” says Peng Qian, a graduate student at MIT and co-author of both studies.
The researchers plan to next run their experiments on larger datasets and find out if grammar-enriched models learn new words and phrases faster. Just as submitting neural networks to psychology tests is helping AI engineers understand and improve language models, psychologists hope to use this information to build better models of the brain.
“Some component of our genetic endowment gives us this rich ability to speak,” says Ethan Wilcox, a graduate student at Harvard and co-author of both studies. “These are the sorts of methods that can produce insights into how we learn and understand language when our closest kin cannot.”